Even Faster Algorithm for the Chamfer Distance

Feng, Ying; Indyk, Piotr

doi:10.4230/LIPIcs.ICALP.2025.76

Even Faster Algorithm for the Chamfer Distance

Ying Feng MIT, Cambridge, MA, USA Piotr Indyk MIT, Cambridge, MA, USA

Abstract

For two $d$ -dimensional point sets $A, B$ of size up to $n$ , the Chamfer distance from $A$ to $B$ is defined as $CH(A,B)=\sum_{a\in A}\min_{b\in B}\|a-b\|$ . The Chamfer distance is a widely used measure for quantifying dissimilarity between sets of points, used in many machine learning and computer vision applications. A recent work of Bakshi et al, NeuriPS’23, gave the first near-linear time $(1+\varepsilon)$ -approximate algorithm, with a running time of $\mathcal{O}(nd\log(n)/\varepsilon^{2})$ . In this paper we improve the running time further, to $\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2}))$ . When $\varepsilon$ is a constant, this reduces the gap between the upper bound and the trivial $\Omega(dn)$ lower bound significantly, from $\mathcal{O}(\log n)$ to $\mathcal{O}(\log\log n)$ .

Keywords and phrases:

Chamfer distance

Category:

Track A: Algorithms, Complexity and Games

Funding:

Ying Feng: Supported by an MIT Akamai Presidential Fellowship.

Piotr Indyk: Supported in part by the NSF TRIPODS program (award DMS-2022448).

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

DOI:

10.4230/LIPIcs.ICALP.2025.76

Event:

52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Editors:

Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

For any two $d$ -dimensional point sets $A, B$ of sizes up to $n$ , the Chamfer distance from $A$ to $B$ is defined as

\mathsf{CH}(A,B)=\sum_{a\in A}\min_{b\in B}\|a-b\|

where $\|\cdot\|$ is the underlying norm defining the distance between the points. Chamfer distance and its variant, the Relaxed Earth Mover Distance [17, 6], are widely used metrics for quantifying the distance between two sets of points. These measures are especially popular in fields such as machine learning (e.g.,[17, 19]) and computer vision (e.g.,[7, 18, 11, 14]). A closely related notion of “the sum of maximum similarities”, where $\min_{b\in B}\|a-b\|$ is replaced by $\max_{b\in B}a\cdot b$ , has been recently popularized by the ColBERT system [15]. Efficient subroutines for computing Chamfer distances are provided in prominent libraries including Pytorch [2], PDAL [1] and Tensorflow [3]. In many applications (e.g., see [17]), Chamfer distance is favored as a faster alternative to the more computationally intensive Earth-Mover Distance or Wasserstein Distance.

Despite the popularity of Chamfer distance, efficient algorithms for computing it haven’t attracted as much attention as algorithms for, say, the Earth-Mover Distance. The first improvement to the naive $\mathcal{O}(dn^{2})$ -time algorithm was obtained in [18], who utilized the fact that $\mathsf{CH}(A,B)$ can be computed by performing $|A|$ nearest neighbor queries in a data structure storing $B$ . However, even when the state of the art approximate nearest neighbor algorithms are used, this leads to an $(1+\epsilon)$ -approximate estimator with only slightly sub-quadratic running time of $\mathcal{O}\left(dn^{1+\frac{1}{2(1+\epsilon)^{2}-1}}\right)$ in high dimensions [5]¹¹1All algorithms considered in this paper are randomized, and return $(1+\varepsilon)$ -approximate answers with a constant probability. The first near-linear-time algorithm for any dimension was proposed only recently in [8], who gave a $(1+\epsilon)$ -approximation algorithm with a running time of $\mathcal{O}(dn\log(n)/\varepsilon^{2})$ , for $\ell_{1}$ and $\ell_{2}$ norms. Since any algorithm for approximating the distance must run in at least $\Omega(dn)$ time²²2The Chamfer Distance could be dominated by the distance from a single point $a\in A$ to $B$ ., the the upper and lower running time bounds differed by a factor of $\log(n)/\varepsilon^{2}$ .

Our result.

In this paper we make a substantial progress towards reducing the gap between the upper and lower bounds for this problem. In particular, we show the following theorem. Assume a Word RAM model where both the input coordinates and the memory/processor word size is $\mathcal{O}(\log n)$ bits.³³3In the Appendix, we adopt the reduction of [8] to extend the result to coordinates of arbitrary finite precision. Then:

Theorem 1.

There is an algorithm that, given two sets $A, B$ of $d$ -dimensional points with coordinates in $\{1\ldots\mbox{{\rm poly}}(n)\}$ and a parameter $\varepsilon>0$ , computes a $(1+\varepsilon)$ -approximation to the Chamfer distance from $A$ to $B$ under the $\ell_{1}$ metric, in time

\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2})).

The algorithm is randomized and is correct with a constant probability.

Thus, we reduce the gap between upper and lower bounds from $\mathcal{O}(\log(n)/\varepsilon^{2})$ to
$\mathcal{O}(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2})$ .

1.1 Our techniques

Our result is obtained by identifying and overcoming the bottlenecks in the previous algorithm [8]. On a high level, that algorithm consists of two steps, described below. For the sake of exposition, in what follows we assume that the target approximation factor $1+\varepsilon$ is some constant.

Outline of the prior algorithm.

In the first step, for each point $a\in A$ , the algorithm computes an estimate $\mathcal{D}_{a}$ of the distance $\mathsf{opt}_{a}$ from $a$ to its nearest neighbor in $B$ . The estimate is $\mathcal{O}(\log n)$ -approximate, meaning that we $\mathsf{opt}_{a}\leq\mathcal{D}_{a}\leq\mathcal{O}(\log n)\mathsf{opt}_{a}$ . This is achieved as follows. First, the algorithm imposes $\mathcal{O}(\log n)$ grids of sidelength $1,2,4,\ldots$ , and maps each point in $B$ to the corresponding cells. Then, for each $a$ , it identifies the finest grid cell containing both $a$ and some point $b\in B$ . Finally, it uses the distance between $a$ and $b$ as an estimate $\mathcal{D}_{a}$ . To ensure that this process yields an $\mathcal{O}(\log n)$ -approximation, each grid needs to be independently shifted at random. We emphasize that this independence between the shifts of different grids is crucial to ensure the $\mathcal{O}(\log n)$ -approximation guarantee - the more natural approach of using “nested grids” does not work. The whole process takes $\mathcal{O}(nd)$ time per grid, or $\mathcal{O}(nd\log n)$ time overall.

In the second step, the algorithm estimates the Chamfer distance via importance sampling. Specifically, the algorithm samples $T$ points from $A$ , such that the probability of sampling $a$ is proportional to the estimate $\mathcal{D}_{a}$ . For each sampled point $a$ , the distance $\mathsf{opt}_{a}$ from $a$ to its nearest neighbor in $B$ is computed directly in $\mathcal{O}(nd)$ time. The final estimate of the Chamfer distance is equal to the weighted average the $T$ values $\mathsf{opt}_{a}$ . It can be shown that if the number of samples $T$ is equal to the distortion $\mathcal{O}(\log n)$ of the estimates $\mathcal{D}_{a}$ , this yields a constant factor approximation to the Chamfer distance from $A$ to $B$ . The overall cost of the second step is $\mathcal{O}(Tnd)=\mathcal{O}(nd\log n)$ , i.e., asymptotically the same as the cost of the first step.

Intuitions behind the new algorithm.

To improve the running time, we need to reduce the cost of each of the two steps. In what follows we outline the obstacles to this task and how they can be overcome.

Step 1.

The main difficulty in reducing the cost of the first step is that, for each grid, the point-to-cell assignment takes $\mathcal{O}(nd)$ time to compute, so computing these $T$ assignments separately for each grid takes $\mathcal{O}(ndT)$ time. And, since each grid is independently translated by a different random vector, the grids are not nested, i.e., a (smaller) cell of side length $2^{i}$ might contain points from many (larger) cells of side length $2^{i+1}$ . As a result, is unclear how to reuse the point-to-cell assignment in one grid to speedup the assignment in another grid, while computing them separately takes $\mathcal{O}(ndT)$ time.

To overcome this difficulty, we abandon independent shifts and resort to $\mathcal{O}(\log n)$ nested grids. Such grids can be viewed as forming a quadtree with $\mathcal{O}(\log n)$ levels, where any cell $C$ at level $i+1$ (i.e., of side length $2^{i+1}$ ) is connected to $2^{d}$ cells at level $i$ contained in $C$ . (Note that the root node of the quadtree has the highest level $\mathcal{O}(\log n)$ ). Although using a single quadtree increases the approximation error, we show that using two independently shifted quadtrees retains the $\mathcal{O}(\log n)$ approximation factor. That is, we repeat the process of finding the finest grid cell containing both $a$ and some point from $B$ twice, and return the point in $B$ that is closer to $a$ . This amplifies the probability of finding a point from $B$ that is “close” to $a$ , which translates into a better approximation factor compared to using a single quadtree.

We still need show that the point-to-cell assignments can be computed efficiently.. To this end, we observe that for each point $a$ , its assignment to all $\mathcal{O}(\log n)$ nested grids can be encoded as $d$ words of length $\mathcal{O}(\log n)$ , or a $d\times\mathcal{O}(\log n)$ bit matrix $M$ . Each row corresponds to one of the $d$ coordinates, and the most significant bit of a row indicates the assignment to cells at the highest level (i.e. cells with the largest side length) with respect to that coordinate. In other words, the most significant bits of all coordinates are packed into the first column, etc. We observe that two points $a$ and $b$ lie in the same cell of side length $2^{i}$ if and only if their matrices agree in all but the last $i$ columns. If we transpose $M$ and read the resulting matrix in the row-major order, then finding a point $b\in B$ in the finest grid cell containing $a$ is equivalent to finding $b$ that shares the longest common prefix with $a$ . We show that this transposition can be done using $\mathcal{O}(\log n\cdot\log\log n)$ simple operations on words, yielding $\mathcal{O}(n\log n\cdot\log\log n)=\mathcal{O}(nd\cdot\log\log n)$ time overall.

As an aside, we note that quadtree computation is a common task in many geometric algorithms [12]. Although an $\mathcal{O}(n)$ algorithm for this task was known for constant dimension $d$ [10]⁴⁴4Assuming that each coordinate can be represented using $\log n$ bits., to the best of our knowledge our algorithm is the first to achieve $\mathcal{O}(nd\cdot\log\log n)$ time for arbitrary dimension.

Step 2.

At this point we computed estimates $\mathcal{D}_{a}$ such that $\mathsf{opt}_{a}\leq\mathcal{D}_{a}\leq\mathcal{O}(\log n)\mathsf{opt}_{a}$ . Given these estimates, importance sampling still requires sampling $\Omega(\log n)$ points. Therefore, we improve the running time by approximating (up to a constant factor) the values $\mathsf{opt}_{a}$ , as opposed to computing them exactly. This is achieved by computing $\mathcal{O}(\log\log n)$ random projections of the input points, which ensures that that the distance between any fixed pair of points is well-approximated with probability $1-1/\mbox{poly}(\log n)$ . We then employ these projections in a variant of the tournament algorithm of [16] which computes $\mathcal{O}(1)$ -approximate estimates of $\mathsf{opt}_{a}$ for $\mathcal{O}(\log n)$ sampled points $a$ in $\mathcal{O}(nd\log\log n)$ time. Since the algorithm of [16] works for the $\ell_{2}$ metric as opposed to the $\ell_{1}$ metric, we replace Gaussian random projections with Cauchy random projections, and re-analyze the algorithm.

This completes the overview of an $\mathcal{O}(nd\log\log n)$ -time algorithm for estimate the Chamfer distance up to a constant factor. To achieve a $(1+\varepsilon)$ -approximation guarantee for any $\varepsilon>0$ , we proceed as follows. First, instead of sampling $\mathcal{O}(\log n)$ points as before, we sample $\mathcal{O}(\log(n)/\varepsilon^{2})$ points $a$ . Then, we use the tournament algorithm to compute $\mathcal{O}(1)$ -approximations to $\mathsf{opt}_{a}$ , as before. ⁵⁵5Note that we could use the tournament algorithm to report $(1+\varepsilon)$ -approximate answers, but then the dependence of the running time on $1/\varepsilon$ would become quartic, as the $1/\varepsilon^{2}$ term in the sample size would be multiplied by another $1/\varepsilon^{2}$ term in the bound for the number of projections needed to guarantee that the tournament algorithm returns $(1+\varepsilon)$ -approximate answers. Then we use a technique called rejection sampling to simulate the process of sampling $\mathcal{O}(1/\varepsilon^{2})$ points $a$ with probability proportional to $\Theta(\mathsf{opt}_{a})$ . For each such point, we compute $\mathsf{opt}_{a}$ exactly in $\mathcal{O}(nd)$ time. Finally, we use the $\mathcal{O}(1/\varepsilon^{2})$ sampled points $a$ and the exact values of $\mathsf{opt}_{a}$ in importance sampling to estimate the Chamfer distance up to a factor of $1+\varepsilon$ .

This concludes the overview of our algorithm for the Chamfer distance under the $\ell_{1}$ metric. We remark that [8] also extends their result from the $\ell_{1}$ metric to the $\ell_{2}$ metric by first embedding points from $\ell_{2}$ to $\ell_{1}$ using random projections. This takes $\mathcal{O}(nd\cdot\log n)$ time, which exceeds the runtime of our algorithm, eliminating our improvement. However, a faster embedding method would yield an improved runtime for the Chamfer distance under the $\ell_{2}$ metric. We leave finding a faster embedding algorithm as an open problem.

2 Preliminaries

In this paper, we consider the regime where the approximation factor $\varepsilon\geq\frac{\log^{2}n}{\sqrt{n}}$ . Note that otherwise, an $\mathcal{O}(nd/\varepsilon^{2})$ time bound would be close to the runtime of a naive exact computation.

In the proof of Theorem 1, we assume a Word RAM model where both the input coordinates and the memory/processor word size is $\mathcal{O}(\log n)$ bits. This model is particularly important in procedures Concatenate and Transpose, where we rely on the fact that we can shift bits and perform bit-wise AND, ADD and OR operations in constant time.

Notation.

For any integers $a\geq 1$ , we use $[n]$ to denote the set of all integers from $1$ to $n$ . For any two real numbers $a, b$ such that $a\leq b$ , we use $[a,b]$ to denote the set of all reals from $a$ to $b$ . Let $d$ be the dimension of points.

For any $q\in\mathbb{R}^{d}$ , define $\mathsf{opt}^{P}_{q}:=\min_{p\in P}\lVert q-p\rVert_{1}$ for some subset $P$ of $\mathbb{R}^{d}$ . We will omit the superscript $P$ when it is clear in the context.

3 Quadtree

In Figure 1, we show an algorithm QuadTree that outputs crude estimations of the nearest distances simulatenously for a set of points. The estimation guarantee is the same as the $\mathtt{CrudeNN}$ algorithm in [8]. While [8] achieves this using a quadtree with $\log n$ independent levels, which naturally introduce a $\log n$ runtime overhead, we show that two compressed quadtrees with dependent levels suffice. Our construction of compressed quadtrees is a generalization of [10] to high dimensions.

Input: Two size- $n$ subsets $Q:=\{q_{i}\}_{i\in[n]}$ and $P:=\{p_{i}\}_{i\in[n]}$ of a metric space $(\mathbb{R}^{d},\lVert\cdot\rVert_{1})$ , such that $Q,P\subset[0,\alpha]^{d}$ for some bound $\alpha={\rm poly}(n)$ .

Output: A set of $n$ values $\{\mathcal{D}_{i}\}_{i\in[n]}$ , such that every $\mathcal{D}_{i}\in\mathbb{R}$ satisfies $\mathcal{D}_{i}\geq\mathsf{opt}^{P}_{q_{i}}$ .

1.

Let $t=\lceil\log(\alpha)\rceil+1$ . Sample two uniformly random points $z,z^{\prime}\sim[0,2^{t-1}]^{d}$ . For any point $x\in[0,\alpha]^{d}$ , define

$h(x):=(\lceil\vec{x}_{1}+\vec{z}_{1}\rceil,\lceil\vec{x}_{2}+\vec{z}_{2}\rceil% ,\cdots,\lceil\vec{x}_{d}+\vec{z}_{d}\rceil),$

$h^{\prime}(x):=(\lceil\vec{x}_{1}+\vec{z^{\prime}}_{1}\rceil,\lceil\vec{x}_{2}% +\vec{z^{\prime}}_{2}\rceil,\cdots,\lceil\vec{x}_{d}+\vec{z^{\prime}}_{d}% \rceil),$

where $\vec{x}_{i},\vec{z}_{i},\vec{z^{\prime}}_{i}$ are the $i$ -th coordinates of $x,z,z^{\prime}$ , respectively.
2.
For each $x\in Q\cup P$ :
- $\blacksquare$
  
  Compute $h(x)$ and write each element of $h(x)$ as a $t$ -bit binary string. Then $h(x)$ can be viewed as a $d$ -by- $t$ binary matrix stored in the row-major order, whose $(i,j)$ -th entry is the $j$ -th significant bit of the $i$ -th element of $h(x)$ . Transpose this matrix and concatenate the rows of the transpose. Denote the resulting binary string as $h(x)^{\top}$ .
- $\blacksquare$
  
  Similarly, compute $h^{\prime}(x)^{\top}$ .
3.

Use $h(x)^{\top}$ as keys to sort all $x\in Q\cup P$ . Also, use $h^{\prime}(x)^{\top}$ as keys to sort all $x\in Q\cup P$ .
4.
For each $q_{i}\in Q$ :
- $\blacksquare$
  
  Use the sort to find a $p\in P$ that maximizes the length $l$ of the longest common prefix of $h(q_{i})^{\top}$ and $h(p)^{\top}$ . Similarly, find a $p^{\prime}\in P$ that maximizes the length $l^{\prime}$ of the longest common prefix of $h^{\prime}(q_{i})^{\top}$ and $h^{\prime}(p^{\prime})^{\top}$ .
- $\blacksquare$
  
  If $l\geq l^{\prime}$ then output $\mathcal{D}_{i}:=\lVert q_{i}-p\rVert_{1}$ ; otherwise, output $\mathcal{D}_{i}:=\lVert q_{i}-p^{\prime}\rVert_{1}$ .

Figure 1: The QuadTree Algorithm.

Correctness.

For any $x\in[0,\alpha]^{d}$ and any integer $k$ such that $0\leq k\leq t$ , let $h_{k}(x):=(\lceil\frac{\vec{x}_{1}+\vec{z}_{1}}{2^{k}}\rceil,\lceil\frac{\vec{% x}_{2}+\vec{z}_{2}}{2^{k}}\rceil,$ $\cdots,\lceil\frac{\vec{x}_{d}+\vec{z}_{d}}{2^{k}}\rceil)$ , where $z$ is the random point drawn on Line 1 in Figure 1. Observe that $h_{k}(x)$ is related to the prefix of $h(x)^{\top}$ .

$\vartriangleright$ Claim 2.

Let $q,p\in[0,\alpha]^{d}$ be arbitrary. For any integer $k$ such that $0\leq k\leq t$ , $h_{k}(q)=h_{k}(p)$ if and only if $h(q)^{\top}$ and $h(p)^{\top}$ share a common prefix of length at least $d(t-k)$ .

Proof.

If $h(q)^{\top}$ and $h(p)^{\top}$ share a common prefix of length at least $d(t-k)$ , then in hashes $h(q)$ and $h(p)$ , the first $(t-k)$ bits of all $d$ coordinates are the same. $h_{k}(q)$ and $h_{k}(p)$ compute exactly these bits, thus $h_{k}(q)=h_{k}(p)$ . The reverse direction holds symmetrically. $\hfill\vartriangleleft$

Claim 2 justifies using $h_{k}(\cdot)$ ’s as an alternative representation of the binary string $h(\cdot)^{\top}$ . [8] shows that $h_{k}$ has a locality-sensitive property, which will help us to bound the distance between points.

$\vartriangleright$ Claim 3 (Lemma A.4 of [8]).

For any fixed integer $k$ such that $0\leq k\leq t$ and any two points $q,p\in[0,\alpha]^{d}$ ,

\text{\bf Pr}\bigl{[}h_{k}(q)\neq h_{k}(p)\bigr{]}\leq\frac{\lVert q-p\rVert_{% 1}}{2^{k}},

\text{\bf Pr}\bigl{[}h_{k}(q)=h_{k}(p)\bigr{]}\leq\exp{(-\frac{\lVert q-p% \rVert_{1}}{2^{k}})},

where the probabilities are over the random choice of $z$ .

We now show that if two points have the same hash $h_{k}$ , then their distance is likely not too much greater than $2^{k}$ . A straight-forward bound follows from the diameter of the $d$ -dimensional cube.

Lemma 4.

For all $q\in Q$ , $p\in P$ , and $0\leq k\leq t$ , the following always holds: If $h_{k}(q)=h_{k}(p)$ then $\lVert q-p\rVert_{1}\leq 2^{k}\cdot d$ .

Proof.

Observe that $h_{k}(q)=h_{k}(p)$ only if $q+z$ and $p+z$ are in the same $d$ -dimensional cube of side-length $2^{k}$ . The diameter of such a cube under the $\ell_{1}$ norm is $2^{k}\cdot d$ . Therefore, for any $q, p$ and $0\leq k\leq t$ , $\lVert q-p\rVert_{1}\leq 2^{k}\cdot d$ is a necessary condition for $h_{k}(q)=h_{k}(p)$ to hold. $\hfill\blacktriangleleft$

Moreover, using Claim 3, we can bound this ratio with respect to $n$ .

Lemma 5.

With probability at least $1-\mathcal{O}(1/n)$ , the following holds simultaneously for all $q\in Q$ , $p\in P$ , and $0\leq k\leq t$ : If $h_{k}(q)=h_{k}(p)$ then $\lVert q-p\rVert_{1}\leq 2^{k}\cdot 3\log n$ .

Proof.

We show the contrapositive that with probability $1-\mathcal{O}(1/n)$ , $k<\log(\lVert q-p\rVert_{1}/3\log n)$ implies $h_{k}(q)\neq h_{k}(p)$ simultaneously for all $q\in Q$ and $p\in P$ . It suffices to argue that for any fixed pair of points $q\in Q$ and $p\in P$ , this holds with probability at least $1-\mathcal{O}(1/n^{3})$ . The lemma then follows by a union bound over $n^{2}$ pairs.

Let $k_{0}$ denote the largest integer $k$ that satisfies $k<\log(\lVert q-p\rVert_{1}/3\log n)$ . Then we have

\text{\bf Pr}\bigl{[}h_{k_{0}}(q)=h_{k_{0}}(p)\bigr{]}\leq\exp{(-\frac{\lVert q% -p\rVert_{1}}{2^{k_{0}}})}\leq\exp(-3\log n).

i.e., with probability at least $1-\mathcal{O}(1/n^{3})$ , $h_{k_{0}}(q)\neq h_{k_{0}}(p)$ . Also, it is easy to see that if $h_{k_{0}}(q)\neq h_{k_{0}}(p)$ , then for all $k\leq k_{0}$ , $h_{k}(q)\neq h_{k}(p)$ , concluding the claim. $\hfill\blacktriangleleft$

Symmetrically, if we define $h^{\prime}_{k}(x):=(\lceil\frac{\vec{x}_{1}+\vec{z^{\prime}}_{1}}{2^{k}}\rceil% ,\lceil\frac{\vec{x}_{2}+\vec{z^{\prime}}_{2}}{2^{k}}\rceil,\cdots,\lceil\frac% {\vec{x}_{d}+\vec{z^{\prime}}_{d}}{2^{k}}\rceil)$ , the claims and lemmas above also hold for $h^{\prime}_{k}$ . Using these, we show that the expected outputs of the QuadTree algorithm are (crude) estimations of the nearest neighbor distances.

Theorem 6.

With probability at least $1-\mathcal{O}(1/n)$ , it holds for all $q_{i}\in Q$ that $\text{\bf E}[\mathcal{D}_{i}]\leq 5\min(d,3\log n)\cdot\mathsf{opt}_{q_{i}}^{P}$ .

Proof.

We assume the success case of Lemma 5 for both $h_{k}$ and $h^{\prime}_{k}$ . Fix an arbitrary $q_{i}\in Q$ . Recall that the QuadTree algorithm finds $p,p^{\prime}\in P$ for $q_{i}$ , which are associated with longest common prefixes of lengths $l,l^{\prime}$ , respectively. For integer $k:0\leq k\leq t$ , let $\mathcal{E}_{k}$ denote the event $d(t-k)\leq\max{(l,l^{\prime})}<d(t-k+1)$ . Observe from Claim 2 that when $\mathcal{E}_{k}$ happens,

$\blacksquare$

either $l\geq l^{\prime}$ and $h_{k}(q_{i})=h_{k}(p)$ ,
$\blacksquare$

or $l^{\prime}>l$ and $h^{\prime}_{k}(q_{i})=h^{\prime}_{k}(p^{\prime})$ .

In both cases, we know from Lemma 4 and 5 that $\mathcal{D}_{i}\leq 2^{k}\cdot\min(d,3\log n)$ .

Let ${{D}}:=\min(d,3\log n)$ , $p^{*}:=\operatorname*{arg\,min}_{p\in P}\lVert q_{i}-p\rVert_{1}$ , and $k^{*}:=\lceil\log(\mathsf{opt}_{q_{i}})\rceil$ . We have

	$\displaystyle\text{\bf E}[\mathcal{D}_{i}]$	$\displaystyle\leq\sum_{0\leq k\leq t}\text{\bf Pr}\bigl{[}\mathcal{E}_{k}\bigr% {]}\cdot(2^{k}\cdot{{D}})$
		$\displaystyle\leq{{D}}(\sum_{0\leq k\leq k^{*}}\text{\bf Pr}\bigl{[}\mathcal{E% }_{k}\bigr{]}\cdot\mathsf{opt}_{q_{i}}+$
		$\displaystyle\hskip 26.00009pt\sum_{k^{}<k\leq t}\text{\bf Pr}\bigl{[}h_{k-1}% (q_{i})\neq h_{k-1}(p^{})\wedge h^{\prime}_{k-1}(q_{i})\neq h^{\prime}_{k-1}(% p^{*})\bigr{]}\cdot 2^{k})$

where the second inequality holds because $\mathcal{E}_{k}$ implies that neither pair $\{h(q_{i})^{\top},h(p^{*})^{\top}\}$ nor $\{h^{\prime}(q_{i})^{\top},h^{\prime}(p^{*})^{\top}\}$ share a common prefix of length $\geq d(t-k+1)$ . Thus $h_{k-1}(q_{i})\neq h_{k-1}(p^{*})$ and $h^{\prime}_{k-1}(q_{i})\neq h^{\prime}_{k-1}(p^{*})$ by Claim 2.

Moreover, events $\mathcal{E}_{k}$ for all $k$ form a partition of a sample space, so $\sum_{k}\text{\bf Pr}\bigl{[}\mathcal{E}_{k}\bigr{]}\leq 1$ . Applying this and the locality sensitive properties of $h_{k-1}$ and $h^{\prime}_{k-1}$ , we get

\text{\bf E}[\mathcal{D}_{i}]\leq{{D}}(\mathsf{opt}_{q_{i}}+\sum_{k^{*}<k\leq t% }(\frac{\mathsf{opt}_{q_{i}}}{2^{k-1}})^{2}\cdot 2^{k})\leq{{D}}(\mathsf{opt}_% {q_{i}}+2\mathsf{opt}_{q_{i}}\sum_{k^{*}<k\leq t}\frac{\mathsf{opt}_{q_{i}}}{2% ^{k-1}})\leq 5{{D}}\cdot\mathsf{opt}_{q_{i}}\

$\hfill\blacktriangleleft$

Runtime analysis.

Lemma 7 (Line 2).

For any $x\in Q\cup P$ , $h(x)^{\top}$ (and $h^{\prime}(x)^{\top}$ ) can be computed in $\mathcal{O}(d\log\log n)$ time.

Proof.

We assume without loss of generality that both $d, t$ are powers of $2$ . Computing the binary matrix representation of $h(x)$ can be done in $\mathcal{O}(d)$ time since $t=\mathcal{O}(\log n)$ . Given this, we compute $h(x)^{\top}$ as follows.

Case 1: $d\geq t$ .

We partition the matrix into $t$ -by- $t$ square submatrices, denoted by

\mathsf{Matrix}(h(x)):=\underbrace{\begin{bmatrix}\begin{array}[]{c}M_{1}\\ \hline\cr M_{2}\\ \hline\cr\vdots\\ \hline\cr M_{d/t}\end{array}\end{bmatrix}}_{\displaystyle t}\left.\vphantom{% \begin{bmatrix}\begin{array}[]{c}M_{1}\\ \hline\cr M_{2}\\ \hline\cr\vdots\\ \hline\cr M_{d/t}\end{array}\end{bmatrix}}\right\}d

For each $i\in[d/t]$ , we use a recursive subroutine $\texttt{Transpose}(M_{i},t)$ to compute $M_{i}^{\top}$ . See Figure 2 for a pictorial illustration of the Transpose algorithm.

Refer to caption — Figure 2: A pictorial example for $8$ -by- $8$ square matrix and $w=4$ . The left figure (a) shows how the Transpose algorithm handles the rows of $\tilde{M}$ in lines 2a and 3a. The right figure (b) illustrates the transpose outcome.

Input: An $I$ -by- $J$ bit matrix $M$ where $I\leq J$ are powers of $2$ . An integer $w$ that is a power of $2$ and $2\leq w\leq I$ .

Output: An $I$ -by- $J$ matrix $M^{\prime}$ such that if it is partitioned into $w$ -by- $w$ square submatrices, then each submatrix is the transpose of the corresponding submatrix of $M$ at the same coordinates.

1.

Let

$\tilde{M}=\begin{cases}M&\text{if $w$=2}\\ \texttt{Transpose}(M,w/2)&\text{otherwise}\end{cases}$

be zero-indexed and $\tilde{M}[i,j]$ is its $(i,j)$ -th entry.
2.
For each integer $i$ such that $0\leq i<I$ :
1. (a)
  
  Compute a $J$ -bit binary string $b_{i}$ such that for $j:0\leq j<J$ , its $j$ -th bit $b_{i}[j]=\begin{cases}\tilde{M}[i,j]&\text{if $(j\mod w)<w/2$}\\ 0&\text{otherwise}\end{cases}$ .
  
  Also, compute a string $\overline{b_{i}}[j]=\begin{cases}0&\text{if $(j\mod w)<w/2$}\\ \tilde{M}[i,j]&\text{otherwise}\end{cases}$ .
3.
Define an $I$ -by- $J$ matrix $M^{\prime}$ , such that for each integer $0\leq i<I$ :
1. (a)
  
  Let the $i$ -th row of $M^{\prime}$ be $\begin{cases}b_{i}+b_{i+(w/2)}\gg(w/2)&\text{if $(i\mod w)<w/2$}\\ \overline{b_{i}}+\overline{b_{i-(w/2)}}\ll(w/2)&\text{if $(i\mod w)\geq w/2$}% \end{cases}$ ,
  
  where $\gg(w/2)$ (resp. $\ll$ ) denote the operation of shifting a string to the right (resp. left) by $w/2$ bits.
4.

Output $M^{\prime}$ .

The correctness of the Transpose algorithm can be shown by induction on (the base- $2$ logarithm of) $w$ . When $I=J=t=\mathcal{O}(\log n)$ , Line 2a and 3a can be done using a constant number of operations on words. Thus we get the following runtime.

$\vartriangleright$ Claim 8.

Assuming $t=\mathcal{O}(\log n)$ the procedure $\texttt{Transpose}(M_{i},t)$ runs in $\mathcal{O}(t\cdot\log t)$ time.

We execute the Transpose algorithm for all $i$ , which takes $\mathcal{O}((d/t)\cdot t\log t)=\mathcal{O}(d\log\log n)$ . Then we can write down $h(x)^{\top}$ by concatenating rows of $M_{i}^{\top}$ ’s, which takes $\mathcal{O}(t\cdot(d/t))$ time.

Case 2: $t>d$ .

We again partition the matrix into $t$ -by- $t$ square submatrices. In this case, we obtain $\mathsf{Matrix}(h(x))=M:=\underbrace{\begin{bmatrix}\begin{array}[]{c|c|c|c}M_% {1}&M_{2}&\ldots&M_{t/d}\end{array}\end{bmatrix}}_{\displaystyle t}\left.% \vphantom{\begin{bmatrix}\begin{array}[]{c|c|c|c}M_{1}&M_{2}&\ldots&M_{t/d}% \end{array}\end{bmatrix}}\right\}d$ .

$\vartriangleright$ Claim 9.

Given $t=\mathcal{O}(\log n)$ , $\texttt{Transpose}(M,d)$ runs in $\mathcal{O}(d\log d)\leq\mathcal{O}(d\log\log n)$ time.

We execute $\texttt{Transpose}(M,d)$ and obtain $M^{\prime}=\begin{bmatrix}\begin{array}[]{c|c|c|c}M_{1}^{\top}&M_{2}^{\top}&% \ldots&M_{t/d}^{\top}\end{array}\end{bmatrix}$ . In principle, to obtain $h(x)^{\top}$ , we just concatenate $d\cdot(t/d)$ rows of all $M_{i}^{\top}$ . However, when $t\gg d$ , this takes longer than $\mathcal{O}(d\log\log n)$ time. We instead use another recursive subroutine $\texttt{Concatenate}(M^{\prime},d)$ . An example of the Concatenate algorithm is given in Figure 3.

Input: An $I$ -by- $J$ bit matrix $M$ where $I\leq J$ are powers of $2$ . An integer $w$ that is a power of $2$ and $I\leq w\leq J$ .

Output: An $I J$ -bit string $B$ such that if it is partitioned into $w$ -bit blocks, then the $u$ -th block (zero-indexed from left to right) are bits on the $(u\mod I)$ -th row of $M$ from column $w\cdot\lfloor u/I\rfloor$ to $w\cdot\lfloor u/I\rfloor+w$ .

1.

If $w=I$ then output $B=M$ .
2.
For each integer $i$ such that $0\leq i<I$ :
1. (a)
  
  Partition the $i$ -th row of $M$ into $w$ -bit blocks, denoted as $\overbrace{[\underbrace{b_{i,1}}_{w}\mid b_{i,2}\mid\ldots\mid b_{i,J/w}]}^{J}$ . Compute a $2J$ -bit string $b_{i}=[b_{i,1}\mid 0_{w}\mid b_{i,2}\mid 0_{w}\mid\ldots\mid b_{i,J/w}\mid 0_{% w}]$ , where $0_{w}$ is a $w$ -bit all-zero string.
3.
Define an $I/2$ -by- $2J$ matrix $M^{\prime}$ , such that for each integer $0\leq i<I/2$ :
1. (a)
  
  Let the $i$ -th row of $M^{\prime}$ be $b_{2i}+b_{2i+1}\gg w$ , where $\gg w$ is the operation of shifting a string to the right by $w$ bits.
4.

Output $\texttt{Concatenate}(M^{\prime},2w)$ .

The correctness of the Concatenate algorithm can again be observed by inducting on the logarithm of $w$ . Line 2a and 3a can be done using $\mathcal{O}((J/w)\cdot\lceil w/\log n\rceil)$ operations on words, and both lines are repeated for $I$ times in each recursive call. Therefore, the total runtime is

	$\displaystyle\mathcal{O}(\sum_{s=1}^{\log(d)}2^{s}\cdot\frac{2^{\log(d)-s}t}{2% ^{\log(d)-s}d}\cdot\lceil\frac{2^{\log(d)-s}d}{\log n}\rceil)$	$\displaystyle=\mathcal{O}(\sum_{s=1}^{\log(d)}2^{s}\cdot\max{(\frac{t}{d},% \frac{t}{d}\cdot\frac{2^{\log(d)-s}d}{\log n})})$
		$\displaystyle=\mathcal{O}(\log d\cdot\max(t,\frac{td}{\log n}))$
		$\displaystyle=\mathcal{O}(d\log\log n)\$

$\hfill\blacktriangleleft$

Theorem 10.

The QuadTree algorithm runs in $\mathcal{O}(nd\log\log n)$ time.

Proof.

Computing $h(x)$ for all $x$ takes $\mathcal{O}(nd)$ time. Then computing $h(x)^{\top}$ takes
$\mathcal{O}(nd\log\log n)$ time. After that, sorting $\mathcal{O}(n)$ -many $\mathcal{O}(d\log n)$ -bit strings can be done in $\mathcal{O}(nd)$ time using radix sort. Finally, to find $p\in P$ with the longest common prefix for every $q\in Q$ , we go through the sorted list and link each $q\in Q$ with adjacent $p\in P$ , which takes $\mathcal{O}(n)$ total time. The above time bounds also hold for $h^{\prime}(x)$ ’s, resulting in $\mathcal{O}(nd\log\log n)$ time in total. $\hfill\blacktriangleleft$

4 Tournament

In this section, we compute the $2$ -approximation of the nearest neighbor distances for logarithmically many queries. We do so using a depth- $2$ tournament. In one branch of the tournament tree, we sample a small set $\overline{S}$ of input points uniformly at random. In the other branch, we partition input points into random groups, project them to a lower dimensional space, and then collect the nearest neighbor in the projected space in every group as a set $\tilde{S}$ . The final output of the tournament is the nearest neighbor among points in $\overline{S},\tilde{S}$ in the original space. Intuitively, if there are many ( $2$ -approximate) near neighbors, then a random subset $\overline{S}$ should contain one of them. And when there are few, these neighbors are likely to be assigned to different random groups, in which case the true nearest neighbor should be collected to $\tilde{S}$ .

Notation.

We use the same notation ${{D}}:=\min{(d,3\log n)}$ as in the previous section. For any finite subset $T\subset\mathbb{R}$ , let $\mathsf{med}T\in\mathbb{R}$ denote the median of $T$ .

When working under the $\ell_{1}$ norm, we use Cauchy random variables to project points. We first recall a standard bound on the median of projections, which will be useful for our analysis. (The following lemma essentially follows from Claim 2 and Lemma 2 in [13]; we reprove it in the appendix for completeness.)

Lemma 11.

Let $x,y\in\mathbb{R}^{d}$ and $0<c<1/2$ . Sample $r$ random vectors $v_{1},v_{2},\cdots,v_{r}\sim(\mathsf{Cauchy}(0,1))^{d}$ . With probability at least $1-2e^{-rc^{2}/50}$ , $\mathsf{med}\{|v_{i}\cdot(x-y)|:i\in[r]\}\in(1\pm c)\lVert x-y\rVert_{1}$ .

In Figure 4, we describe how to construct a data structure to find $2$ -approximate nearest neighbors. The construction borrows ideas from the second algorithm of [16], but using a tournament of depth $2$ instead of $\mathcal{O}(\log n)$ .

Input: A set of $t$ queries $\{q_{i}\}_{i\in[t]}$ and a set of $n$ points $P$ , which are both subsets of a metric space $(\mathbb{R}^{d},\|\cdot\|_{1})$ .

Output: A set of $t$ values $\{\mathcal{D}_{i}\}_{i\in[t]}$ , such that every $\mathcal{D}_{i}\in\mathbb{R}$ satisfies $\mathcal{D}_{i}\geq\mathsf{opt}^{P}_{q_{i}}$ .

Building the Data Structure.

1.

Let $r\geq 800(2\log t+\log\log n)$ .
2.

For each $j\in[r]$ , draw $v_{j}\sim(\mathsf{Cauchy}(0,1))^{d}$ , compute $v_{j}\cdot p$ for all points $p\in P$ , and store all $v_{j}$ and $v_{j}\cdot p$ .
3.

Randomly partition $P$ into $n/\log n$ subsets $P_{1},P_{2},\cdots,P_{n/\log n}$ , each of size $\log n$ .

Processing the Queries.

For each query $q:=q_{i}$ for $i\in[t]$ :

1.

Compute $v_{j}\cdot q$ for all $j\in[r]$ .
2.

Let $\tilde{S}$ be an empty set. For each $k\in[n/\log n]$ :
- $\blacksquare$
  
  Compute $\mathsf{med}_{p}:=\mathsf{med}\{|v_{j}\cdot(q-p)|:j\in[r]\}$ for every $p\in P_{k}$ .
- $\blacksquare$
  
  Find $p:=\operatorname*{arg\,min}_{p\in P_{k}}\{\mathsf{med}_{p}\}$ and add it into $\tilde{S}$ .
3.

Find $\tilde{p}=\operatorname*{arg\,min}_{p\in\tilde{S}}\lVert q-p\rVert_{1}$ by computing and comparing all exact distances $\lVert q-p\rVert_{1}$ for $p\in\tilde{S}$ .
4.

Let $\overline{S}$ be a set of $w\geq 90t\log t\log n$ samples drawn uniformly at random from $P$ . Find a point $\overline{p}=\operatorname*{arg\,min}_{p\in\overline{S}}\lVert q-p\rVert_{1}$ by computing and comparing all exact distances $\lVert q-p\rVert_{1}$ for $p\in\overline{S}$ .
5.

Output $\mathcal{D}_{i}:=\min{(\lVert q-\tilde{p}\rVert_{1},\lVert q-\overline{p}% \rVert_{1})}$ .

Figure 4: The Tournament Algorithm.

Correctness.

Fix a query $q:=q_{i}$ . Let $S$ denote the set of all $2$ -approximate nearest neighbors to $q$ , i.e., $S:=\{p\in P:\lVert q-p\rVert_{1}\leq 2\mathsf{opt}_{q}\}.$ We prove the correctness of the algorithm by casing on the size of $S$ .

Lemma 12 (Case 1).

If $|S|\geq\frac{n}{30t\log n}$ then with probability at least $1-\frac{1}{10t}$ we have $\overline{p}\in S$ .

Proof.

With probability $\frac{1}{30t\log n}$ , a random sample from $P$ is in $S$ . Therefore, for $\overline{S}$ containing $w$ independent samples, at least one of them is in $S$ with probability $1-(1-\frac{1}{30t\log n})^{w}\geq 1-e^{-w/(30t\log n)}$ . Moreover, when this happens, it must be that $\overline{p}\in S$ . Setting $w\geq 90t\log t\log n$ gives the desired probability. $\hfill\blacktriangleleft$

Lemma 13 (Case 2).

If $|S|<\frac{n}{30t\log n}$ then with probability at least $1-\frac{1}{10t}$ , $\tilde{p}\in S$ .

Let $p^{*}\in P$ denote a nearest neighbor of $q$ , i.e. $\lVert q-p^{*}\rVert_{1}=\mathsf{opt}_{q}$ . To prove Lemma 13, we first make the following observation:

Lemma 14.

Let $P^{\prime}$ be an arbitrary subset of $P\setminus S$ . The probability that there exists $p\in P^{\prime}$ such that $\mathsf{med}_{p}\leq\mathsf{med}_{p^{*}}$ , where $\mathsf{med}_{p}:=\mathsf{med}\{|v_{j}\cdot(q-p)|:j\in[r]\}$ and $\mathsf{med}_{p^{*}}:=\mathsf{med}\{|v_{j}\cdot(q-p^{*})|:j\in[r]\}$ , is at most $\frac{2(|P^{\prime}|+1)}{t^{2}\log n}$ .

Proof.

From $P^{\prime}\subseteq(P\setminus S)$ we know that $\lVert q-p\rVert_{1}>2\lVert q-p^{*}\rVert_{1}$ for any $p\in P^{\prime}$ . Therefore, if $\mathsf{med}_{p}\leq\mathsf{med}_{p^{*}}$ then either $\mathsf{med}_{p}\neq(1\pm\frac{1}{4})\lVert q-p\rVert_{1}$ or $\mathsf{med}_{p^{*}}\neq(1\pm\frac{1}{4})\lVert q-p^{*}\rVert_{1}$ . Applying Lemma 11 with $c=\frac{1}{4}$ and $r\geq 800(2\log t+\log\log n)$ and a union bound, we get that

	$\displaystyle\text{\bf Pr}\bigl{[}\exists p\in P^{\prime}:\mathsf{med}_{p}\leq% \mathsf{med}_{p^{*}}\bigr{]}$	$\displaystyle\leq\text{\bf Pr}\bigl{[}\exists p\in P^{\prime}:\mathsf{med}_{p}% \neq(1\pm\frac{1}{4})\lVert q-p\rVert_{1}\bigr{]}+$
		$\displaystyle\hskip 13.00005pt\text{\bf Pr}\bigl{[}\mathsf{med}_{p^{}}\neq(1% \pm\frac{1}{4})\lVert q-p^{}\rVert_{1}\bigr{]}$
		$\displaystyle\leq(\|P^{\prime}\|+1)\cdot 2e^{-(2\log t+\log\log n)}$
		$\displaystyle=\frac{2(\|P^{\prime}\|+1)}{t^{2}\log n}.\$

$\hfill\blacktriangleleft$

In Line 3 of the data structure building procedure, the point $p^{*}$ is assigned to one of the subsets $P^{*}\in\{P_{1},P_{2},$ $\cdots,P_{n/\log n}\}$ . If $|S|<\frac{n}{30t\log n}$ , then one can show that $p^{*}$ is likely the only $2$ -approximate nearest neighbor in $P^{*}$ . Conditioned on this, we can use Lemma 14 to show that $p^{*}$ is added into $\tilde{S}$ with high probability.

Lemma 15.

If $|S|<\frac{n}{30t\log n}$ , then with probability at least $1-\frac{1}{10t}$ , $p^{*}$ is added into $\tilde{S}$ on Line 2 when processing the query $q$ .

Proof.

The set $P^{*}$ contains $\log n$ points. Since we randomly partition $P$ into $P_{1},\cdots,P_{n/\log n}$ , $P^{*}\setminus\{p^{*}\}$ is a uniformly random subset of $P$ . When $|S|<\frac{n}{30t\log n}$ , $\text{\bf Pr}\bigl{[}(P^{*}\setminus\{p^{*}\})\cap S=\varnothing\bigr{]}\geq 1% -\frac{1}{20t}$ . Conditioned on this event, we have

\text{\bf Pr}\bigl{[}\exists p\in(P^{*}\setminus\{p^{*}\}):\mathsf{med}_{p}% \leq\mathsf{med}_{p^{*}}\bigr{]}\leq\frac{2|P^{*}|}{t^{2}\log n}\leq\frac{1}{2% 0t}

by Lemma 14, as long as $t\geq 40$ . Therefore, with probability at least $1-\frac{1}{10t}$ , $p^{*}$ has the smallest median of projected distance to $q$ , and thus must be added to $\tilde{S}$ . $\hfill\blacktriangleleft$

Lemma 13 is a direct corollary of Lemma 15.

Proof (of Lemma 13)..

$p^{*}\in\tilde{S}$ implies that if we compute $\tilde{p}=\operatorname*{arg\,min}_{p\in\tilde{S}}\lVert q-p\rVert_{1}$ , we are guaranteed to find a nearest neighbor of $q$ , which is clearly in $S$ . $\hfill\blacktriangleleft$

Combining Lemma 12 and 13 we get the following correctness guarantee:

Theorem 16.

Given $t$ queries $\{q_{i}\}_{i\in[t]}$ , with probability at least $9/10$ , the Tournament algorithm outputs $2$ -approximate nearest neighbors simulataneously for all $t$ queries.

Finally, we state the runtime guarantee as follows:

Theorem 17.

The Tournament algorithm runs in $\mathcal{O}(n(d+t)(\log t+\log\log n)+dt^{2}\log t\log n)$ time.

Proof.

For preprocessing, the algorithm projects all points in $P$ using $r$ projections, which takes $\mathcal{O}(n\cdot d\cdot r)$ time. To process a query $q$ , we first take $\mathcal{O}(dr)$ time to project $q$ . We then count the number of comparisons we make to find the minimums of medians, which is $\mathcal{O}((n/\log n)\log n\cdot r)$ using a linear-time median selection algorithm [9]. Each comparison can be done in $\mathcal{O}(1)$ time given that $v_{j}\cdot p$ and $v_{j}\cdot q$ for all $j\in[r]$ and $p\in P$ are stored. Finally, we do a linear scan over $\tilde{S}$ and $\overline{S}$ , which takes $\mathcal{O}(d(\log n+w))$ time.

We plug in $r=\mathcal{O}(\log t+\log\log n)$ and $w=\mathcal{O}(t\log t\log n)$ . For $t$ queries, the total runtime is $\mathcal{O}(n(d+t)(\log t+\log\log n)+dt^{2}\log t\log n)$ . $\hfill\blacktriangleleft$

For our purpose of estimating the Chamfer distance, we will apply the Tournament algorithm with a number of queries $t=\Theta({{D}}/\varepsilon^{2})$ for ${{D}}=\min{(d,3\log n)}$ and some $\varepsilon>0$ satisfying $\varepsilon^{-2}=\mathcal{O}(\frac{n}{\log^{4}n})$ . Under this setting, the runtime is dominated by the first additive term of Theorem 17, which is at most $\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2})$ .

5 Rejection Sampling

Notation.

All occurrences of $\mathsf{opt}$ in this section are with respect to the set $B$ . Let $\varepsilon>0$ be our target approximation factor. We call the distribution $\mathcal{P}$ an $f$ -Chamfer distribution for some $f=f(n,d,\varepsilon)$ , if it is supported on $A$ and for every $a\in A$ ,

f\frac{\mathsf{opt}_{a}}{\mathsf{CH}(A,B)}\leq\mathcal{P}(a),\text{where we % denote }\mathcal{P}(a):=\underset{x\sim\mathcal{P}}{\text{\bf Pr}}\bigl{[}x=a% \bigr{]}.

We first show a general bound for estimating the Chamfer distance using samples from a Chamfer distribution. This follows from a standard analysis of importance sampling.

Lemma 18.

Let $X:=\{x_{i}\}_{i\in[t]}$ be a set of $t$ samples drawn from a $f$ -chamfer distribution $\mathcal{P}$ . Fix $h=h(n,d,\varepsilon)\geq 1$ . Given an arbitrarily $\tilde{\mathsf{opt}}_{x_{i}}$ for every $x_{i}$ that satisfies $\mathsf{opt}_{x_{i}}\leq\tilde{\mathsf{opt}}_{x_{i}}\leq h\cdot\mathsf{opt}_{x% _{i}}$ , then for any $0<\kappa<1$ ,

\text{\bf Pr}\Bigl{[}\tilde{\mathsf{CH}}(A,B)\leq(1-\kappa)\mathsf{CH}(A,B)% \Bigr{]}+\text{\bf Pr}\Bigl{[}\tilde{\mathsf{CH}}(A,B)\geq(1+\kappa)\cdot h% \cdot\mathsf{CH}(A,B)\Bigr{]}\leq\frac{\frac{h^{2}}{f}-1}{t\cdot\kappa^{2}},

where $\tilde{\mathsf{CH}}(A,B):=\frac{\sum_{i\in[t]}\tilde{\mathsf{opt}}_{x_{i}}/% \mathcal{P}(x_{i})}{t}.$

Proof.

For the purpose of analysis, assume that we additionally have arbitrary $\tilde{\mathsf{opt}}_{a}$ for $a\in(A\setminus X)$ that also satisfies $\mathsf{opt}_{a}\leq\tilde{\mathsf{opt}}_{a}\leq h\cdot\mathsf{opt}_{a}$ . By linearity,

\text{\bf E}[\tilde{\mathsf{CH}}(A,B)]=\frac{\sum_{i\in[t]}\text{\bf E}[\tilde% {\mathsf{opt}}_{x_{i}}/\mathcal{P}(x_{i})]}{t}=\sum_{a\in A}\mathcal{P}(a)% \cdot\frac{\tilde{\mathsf{opt}}_{a}}{\mathcal{P}(a)}\in[\mathsf{CH}(A,B),h% \cdot\mathsf{CH}(A,B)].

We also bound the variance

	$\displaystyle\text{\bf Var}[\tilde{\mathsf{CH}}(A,B)]$	$\displaystyle\leq\frac{\text{\bf E}[\tilde{\mathsf{opt}}_{x_{1}}^{2}/\mathcal{% P}(x_{1})^{2}]}{t}-\mathsf{CH}(A,B)^{2}$
		$\displaystyle\leq\frac{1}{t}(\sum_{a\in A}\frac{\tilde{\mathsf{opt}}_{a}^{2}}{% \mathcal{P}(a)}-\mathsf{CH}(A,B)^{2})$
		$\displaystyle\leq\frac{1}{t}(\frac{h}{f}\mathsf{CH}(A,B)\cdot\sum_{a\in A}% \tilde{\mathsf{opt}}_{a}-\mathsf{CH}(A,B)^{2})$
		$\displaystyle\leq\frac{1}{t}\cdot\mathsf{CH}(A,B)^{2}\cdot(\frac{h^{2}}{f}-1)$

where the third inequality follows from $\frac{1}{\mathcal{P}(a)}\leq\frac{\mathsf{CH}(A,B)}{f\cdot\mathsf{opt}_{a}}$ and $\tilde{\mathsf{opt}}_{a}\leq h\cdot\mathsf{opt}_{a}$ . Finally, by Chebyshev’s Inequality, we have

\text{\bf Pr}\Bigl{[}\Bigl{|}\tilde{\mathsf{CH}}(A,B)-\text{\bf E}[\tilde{% \mathsf{CH}}(A,B)]\Bigr{|}\geq\kappa\cdot\mathsf{CH}(A,B)\Bigr{]}\leq\frac{1}{% t}\cdot\frac{\frac{h^{2}}{f}-1}{\kappa^{2}}.\

$\hfill\blacktriangleleft$

In this section, we aim to construct a set of samples $S=\{s_{j}\}_{j\in[s]}$ for some large enough $s$ , such that each $s_{j}$ is drawn from a fixed $\mathcal{O}(1)$ -Chamfer distribution. Once we have $S$ , we can compute a weighted sum of the nearest neighbor distances for $s_{j}\in S$ , and invoke Lemma 18 to show that it is likely an $(1+\varepsilon)$ -estimation of $\mathsf{CH}(A,B)$ .

We will construct such $S$ via a two-step sampling procedure: in the first step, we sample $\Theta({{D}}/\varepsilon^{2})$ points from $A$ using a distribution defined by the estimations from the QuadTree algorithm. In the second step, we subsample these $\Theta({{D}}/\varepsilon^{2})$ points, using an acceptance probability defined by the estimations from the Tournament algorithm. We describe our Chamfer-Estimate algorithm in Figure 5.

Input: Two subsets $A, B$ of a metric space $(\mathbb{R}^{d},\|\cdot\|_{1})$ of size $n$ , a parameter $\varepsilon>0$ , and a parameter $q\in\mathbb{N}$ .

Output: An estimated value $\tilde{\mathsf{CH}}(A,B)\in\mathbb{R}$ .

1.

Execute the algorithm $\texttt{QuadTree}(A,B)$ , and let the output be a set of values $\{\mathcal{D}_{a}\}_{a\in A}$ which always satisfy $\mathcal{D}_{a}\geq\mathsf{opt}_{a}$ . Let $\mathcal{D}:=\sum_{a\in A}\mathcal{D}_{a}$ .
2.

Construct a probability distribution $\mathcal{P}$ supported on $A$ such that for every $a\in A$ , $\mathcal{P}(a)=\frac{\mathcal{D}_{a}}{\mathcal{D}}$ . For $i\in[q]$ , sample $x_{i}\sim\mathcal{P}$ .
3.

Execute the algorithm $\texttt{Tournament}(\{x_{i}\}_{i\in[q]},B)$ , and let the output be a set of values $\{\mathcal{D}^{\prime}_{x_{i}}\}_{i\in[q]}$ which always satisfy $\mathcal{D}^{\prime}_{x_{i}}\geq\mathsf{opt}_{x_{i}}$ . Let $\mathcal{D}^{\prime}:=\sum_{i\in[q]}\frac{\mathcal{D}^{\prime}_{x_{i}}}{% \mathcal{P}(x_{i})}/q$ and denote $\mathcal{P}^{\prime}(a):=\frac{\mathcal{D}^{\prime}_{a}}{\mathcal{D}^{\prime}}$ (which is well-defined only if $a=x_{i}$ for some $i\in[q]$ ).
4.

Define

$M:=\max_{i\in[q]}\frac{\mathcal{P}^{\prime}(x_{i})}{\mathcal{P}(x_{i})}.$

For each $i\in[q]$ , mark $x_{i}$ as accepted with probability $\frac{\mathcal{P}^{\prime}(x_{i})}{M\cdot\mathcal{P}(x_{i})}$ .

If the number of accepted $x_{i}$ is less than $s=10/\varepsilon^{2}$ then output Fail and exit the algorithm. Otherwise, collect the first $s$ accepted $x_{i}$ as a set $S:=\{s_{j}\}_{j\in[s]}$ .
5.

Compute $\mathsf{opt}_{s_{j}}$ for each $j$ . Output

$\tilde{\mathsf{CH}}(A,B):=\sum_{j\in[s]}\frac{\mathsf{opt}_{s_{j}}}{\mathcal{P% }^{\prime}(s_{j})}/s.$

Figure 5: The Chamfer-Estimate Algorithm.

The Chamfer-Estimate algorithm applies the QuadTree algorithm and the Tournament algorithm as subroutines. If they are executed successfully, their outputs should satisfy the following conditions:

Condition 19.

We say the QuadTree algorithm succeeds if for every $a\in A$ , $\text{\bf E}[\mathcal{D}_{a}]\leq 5{{D}}\cdot\mathsf{opt}_{a}$ .

Condition 20.

We say the Tournament algorithm succeeds if for every $x_{i}$ for $i\in[q]$ , $\mathcal{D}^{\prime}_{x_{i}}\leq 2\mathsf{opt}_{x_{i}}$ .

That is, as described in the introduction, we need QuadTree to provide $\mathcal{O}(\log n)$ -approximation (to ensure that the sample size $q$ can be at most logarithmic in $n$ ), and that Tournament provide $\mathcal{O}(1)$ -approximation (to ensure that the final estimator using $s$ samples has variance bounded by a constant).

We state some facts about the Chamfer-Estimate algorithm, which will be useful for our analysis.

$\vartriangleright$ Claim 21 (Line 2).

Under Condition 19, with probability at least $9/10$ , $\mathcal{P}$ is a $(1/50{{D}})$ -Chamfer Distribution.

Proof.

With probability at least $9/10$ , $\mathcal{D}\leq 50{{D}}\cdot\mathsf{CH}(A,B)$ by Markov’s Inequality. Upon this condition, for any $a\in A$ , $\frac{\mathsf{opt}_{a}}{50{{D}}\cdot\mathsf{CH}(A,B)}\leq\frac{\mathcal{D}_{a}% }{\mathcal{D}}.$ $\hfill\vartriangleleft$

$\vartriangleright$ Claim 22 (Line 3).

Let $q\geq 10^{4}{{D}}$ . Under Condition 19 and 20, with probability at least $4/5$ , $\mathcal{D}^{\prime}\geq\mathsf{CH}(A,B)/2$ .

Proof.

We apply the importance sampling analysis in Lemma 18. We assume that Claim 21 holds and $\mathsf{opt}_{x_{i}}\leq\mathcal{D}^{\prime}_{x_{i}}\leq 2\mathsf{opt}_{x_{i}}$ , then

\text{\bf Pr}\bigl{[}\mathcal{D}^{\prime}\leq(1-\frac{1}{2})\mathsf{CH}(A,B)% \bigr{]}\leq\frac{2^{2}\cdot 50{{D}}-1}{q\cdot(\frac{1}{2})^{2}}<\frac{1}{10}.\

$\hfill\vartriangleleft$

Analysis of $𝑺$ .

We now show that the set $S$ on Line 4 collects enough samples (thus the algorithm does not fail) and is equivalent to sampling from a $\mathcal{O}(1)$ -Chamfer distribution $\mathcal{Q}$ . We note that the algorithm, in fact, only knows a $(1/50{{D}})$ -Chamfer distribution $\mathcal{P}$ and probabilities $\mathcal{P}^{\prime}(x_{i})$ for $\{x_{i}\}_{i\in[q]}$ , so it cannot explicitly sample from such $\mathcal{Q}$ . Nevertheless, by a standard analysis of rejection sampling, we show that $S$ “simulates” sampling from $\mathcal{Q}$ .

Lemma 23.

Let $q\geq 10^{4}{{D}}/\varepsilon^{2}$ . Under Condition 19 and 20, with probability at least $3/5$ , the number of accepted $x_{i}$ is at least $q$ , so the algorithm does not fail.

Proof.

We assume that Claim 21 and 22 hold. Then for any $x_{i}$ , $\frac{1}{\mathcal{P}(x_{i})}\leq\frac{50{{D}}\cdot\mathsf{CH}(A,B)}{\mathsf{% opt}_{x_{i}}}$ and $\mathcal{P}^{\prime}(x_{i})=\frac{\mathcal{D}^{\prime}_{a}}{\mathcal{D}^{% \prime}}\leq\frac{2\mathsf{opt}_{x_{i}}}{\mathsf{CH}(A,B)/2}$ . Thus $M\leq 200{{D}}$ . The expectation is

\text{\bf E}[|\{\textsc{accepted }x_{i}\}|]=\sum_{i\in[q]}\frac{\mathcal{P}^{% \prime}(x_{i})}{M\mathcal{P}({x_{i}})}\geq\frac{1}{200{{D}}}\sum_{i\in[q]}% \frac{\mathcal{D}^{\prime}_{x_{i}}}{\mathcal{P}(x_{i})}\cdot\frac{1}{\mathcal{% D}^{\prime}}=\frac{1}{200{{D}}}\cdot q\mathcal{D}^{\prime}\cdot\frac{1}{% \mathcal{D}^{\prime}}=\frac{q}{200{{D}}}

where the second to last equality is due to the definition of $\mathcal{D}^{\prime}:=\sum_{i\in[q]}\frac{\mathcal{D}^{\prime}_{x_{i}}}{% \mathcal{P}(x_{i})}/q$ . The final bound holds by Markov’s Inequality and our setting of $q$ . $\hfill\blacktriangleleft$

Lemma 24.

Each $s_{j}$ is independently and identically distributed, and under Condition 20, $\text{\bf Pr}\bigl{[}s_{j}=a\bigr{]}\geq\frac{\mathsf{opt}_{a}}{2\mathsf{CH}(A% ,B)}$ for any $a\in A$ .

Proof.

The independence and identicality follows directly from our sampling procedure. For the probability statement, we assume (without loss of generality) that during rejection sampling on Line 4, a sample $x_{i}$ is accepted and renamed as $s_{j}$ . Then for any $a\in A$ ,

	$\displaystyle\text{\bf Pr}\bigl{[}s_{j}=a\bigr{]}$	$\displaystyle=\text{\bf Pr}\bigl{[}x_{i}=a\mid x_{i}\text{ accepted}\bigr{]}$
		$\displaystyle=\frac{\mathcal{P}(a)\cdot\text{\bf Pr}\bigl{[}x_{i}\text{ % accepted}\mid x_{i}=a\bigr{]}}{\text{\bf Pr}\bigl{[}x_{i}\text{ accepted}\bigr% {]}}$
		$\displaystyle=\frac{\mathcal{P}(a)\cdot\text{\bf Pr}\bigl{[}x_{i}\text{ % accepted}\mid x_{i}=a\bigr{]}}{\sum_{a_{0}\in A}\mathcal{P}(a_{0})\cdot\text{% \bf Pr}\bigl{[}x_{i}\text{ accepted}\mid x_{i}=a_{0}\bigr{]}}$
		$\displaystyle=\frac{\mathcal{P}(a)\cdot\frac{\mathcal{P}^{\prime}(a)}{M% \mathcal{P}(a)}}{\sum_{a_{0}\in A}\mathcal{P}(a_{0})\cdot\frac{\mathcal{P}^{% \prime}(a_{0})}{M\mathcal{P}(a_{0})}}$

In the final equality, because we conditioned on $x_{i}=a$ (resp. $x_{i}=a_{0}$ ) on the LHS, we know that on the RHS, $\mathcal{P}^{\prime}(a)=\frac{\mathcal{D}^{\prime}(a)}{\mathcal{D}^{\prime}}$ is well-defined and satisfy $\mathsf{opt}_{a}\leq\mathcal{D}^{\prime}(a)\leq 2\mathsf{opt}_{a}$ (resp. $\mathsf{opt}_{a_{0}}\leq\mathcal{D}^{\prime}(a_{0})\leq 2\mathsf{opt}_{a_{0}}$ ), given Condition 20. Therefore, we have

\text{\bf Pr}\bigl{[}s_{j}=a\bigr{]}=\frac{\mathcal{P}^{\prime}(a)}{\sum_{a_{0% }\in A}\mathcal{P}^{\prime}(a_{0})}=\frac{\mathcal{D}^{\prime}(a)}{\sum_{a_{0}% \in A}\mathcal{D}^{\prime}(a_{0})}\geq\frac{\mathsf{opt}_{a}}{2\mathsf{CH}(A,B% )}.\

$\hfill\blacktriangleleft$

Lemma 23 and 24 together say that $S$ can be viewed as a set of $s$ samples from a $\frac{1}{2}$ -Chamfer Distribution, thus we can invoke another importance sampling analysis. In the final step of the algorithm, we compute the exact nearest neighbor distance for all $s_{j}$ and then compute a weighted sum over them. With high probability, this gives an $(1\pm\varepsilon)$ -estimation of $\mathsf{CH}(A,B)$ .

Theorem 25.

Under Condition 19 and 20, Chamfer-Estimate( $A,B,\varepsilon,q\geq 10^{4}{{D}}/\varepsilon^{2}$ ) outputs $\tilde{\mathsf{CH}}(A,B)$ that satisfies $(1-\varepsilon)\mathsf{CH}(A,B)\leq\tilde{\mathsf{CH}}(A,B)\leq(1+\varepsilon)% \mathsf{CH}(A,B)$ with probability at least $1/2$ .

Proof.

In the success case of Lemma 23, we can apply Lemma 18 with $f=1/2$ , $h=1$ , $t=s$ , and $\kappa=\varepsilon$ . Then

\text{\bf Pr}\Bigl{[}\Bigl{|}\tilde{\mathsf{CH}}(A,B)-\mathsf{CH}(A,B)\Bigr{|}% \geq\varepsilon\cdot\mathsf{CH}(A,B)\Bigr{]}\leq\frac{1}{s\cdot\varepsilon^{2}% }.\

$\hfill\blacktriangleleft$

Theorem 26.

Chamfer-Estimate( $A,B,\varepsilon,q=10^{4}{{D}}/\varepsilon^{2}$ ) runs in time $\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2}))$ .

Proof.

This is dominated by the runtime of QuadTree, Tournament, and the time of computing $\mathsf{opt}_{s_{j}}$ on Line 5. $\texttt{QuadTree}(A,B)$ runs in $\mathcal{O}(nd\log\log n)$ time and Tournament $(\{x_{i}\}_{i\in[q]},B)$ runs in $\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2})$ time. Finally, the brute-force search for $\mathsf{opt}_{s_{j}}$ for $j\in[10/\varepsilon^{2}]$ takes $\mathcal{O}(nd/\varepsilon^{2})$ time. $\hfill\blacktriangleleft$

References

[1] Pdal: Chamfer. https://pdal.io/en/2.4.3/apps/chamfer.html, 2023. Accessed: 2023-05-12.
[2] Pytorch3d: Loss functions. https://pytorch3d.readthedocs.io/en/latest/modules/loss.html, 2023. Accessed: 2023-05-12.
[3] Tensorflow graphics: Chamfer distance. https://www.tensorflow.org/graphics/api_docs/python/tfg/nn/loss/chamfer_distance/evaluate, 2023. Accessed: 2023-05-12.
[4] Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, pages 427–436, New York, NY, USA, 1995. Association for Computing Machinery. doi:10.1145/225058.225173.
[5] Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 793–801, 2015. doi:10.1145/2746539.2746553.
[6] Kubilay Atasu and Thomas Mittelholzer. Linear-complexity data-parallel earth mover’s distance approximations. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 364–373. PMLR, 09–15 June 2019. URL: https://proceedings.mlr.press/v97/atasu19a.html.
[7] Vassilis Athitsos and Stan Sclaroff. Estimating 3d hand pose from a cluttered image. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volume 2, pages II–432. IEEE, 2003.
[8] Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, and Erik Waingarten. A near-linear time algorithm for the chamfer distance, 2023. doi:10.48550/arXiv.2307.03043.
[9] Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448–461, August 1973. doi:10.1016/S0022-0000(73)80033-9.
[10] Timothy M. Chan. Well-separated pair decomposition in linear time? Information Processing Letters, 107(5):138–141, 2008. doi:10.1016/j.ipl.2008.02.008.
[11] Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
[12] Sariel Har-Peled. Geometric approximation algorithms. Number 173. American Mathematical Soc., 2011.
[13] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006. doi:10.1145/1147954.1147955.
[14] Li Jiang, Shaoshuai Shi, Xiaojuan Qi, and Jiaya Jia. Gal: Geometric adversarial loss for single-view 3d-object reconstruction. In Proceedings of the European conference on computer vision (ECCV), pages 802–816, 2018.
[15] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. doi:10.1145/3397271.3401075.
[16] Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pages 599–608, New York, NY, USA, 1997. Association for Computing Machinery. doi:10.1145/258533.258653.
[17] Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015. URL: http://proceedings.mlr.press/v37/kusnerb15.html.
[18] Erik B Sudderth, Michael I Mandel, William T Freeman, and Alan S Willsky. Visual hand tracking using nonparametric belief propagation. In 2004 Conference on Computer Vision and Pattern Recognition Workshop, pages 189–189. IEEE, 2004.
[19] Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, and Jing Liao. Transductive zero-shot learning with visual structure constraint. Advances in neural information processing systems, 32, 2019.

Appendix A Reducing the Bit Precision of Inputs

In our algorithm, we assumed that all points in input sets $A, B$ are integers in $\{1,2,\cdots,$ ${\rm poly}(n)\}^{d}$ . Here, we show that this is without loss of generality, as long as all coordinates of the original input are $w$ -bit integers for arbitrary $w\geq\log n$ in a unit-cost RAM with a word length of $w$ bits.

Section $A.3$ of [8] gives an efficient reduction from real inputs to the case that

1\leq\min_{a\in A,b\in B}\lVert a-b\rVert_{1}\leq\max_{a\in A,b\in B}\lVert a-% b\rVert_{1}\leq{\rm poly}(n),

i.e., the input has a ${\rm poly}(n)$ -bounded aspect ratio. Their reduction can be adapted to our case as follows:

$\vartriangleright$ Claim 27 (Lemma $A.3$ of [8]).

Given an $\mathsf{est}$ such that $\mathsf{CH}(A,B)\leq\mathsf{est}\leq{\rm poly}(n)\cdot\mathsf{CH}(A,B)$ , if there exists an algorithm that computes an $(1+\varepsilon)$ -approximation to $\mathsf{CH}(A,B)$ in $\mathcal{O}(nd(\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2}))$ time under the assumption that $A, B$ contain points from $\{1\ldots\mbox{{\rm poly}}(n)\}^{d}$ , then there exists an algorithm that computes an $(1+\varepsilon)$ -approximation to $\mathsf{CH}(A,B)$ for any integer-coordinate $A, B$ in asymptotically same time.

It remains to show how to obtain a ${\rm poly}(n)$ -approximation.

Lemma 28.

There exists an $\mathcal{O}(nd+n\log\log n)$ -time algorithm that computes $\mathsf{est}$ which satisfies $\mathsf{CH}(A,B)\leq\mathsf{est}\leq{\rm poly}(n)\cdot\mathsf{CH}(A,B)$ with $1-\frac{1}{n}$ probability.

Proof.

Similar to (the proof of Lemma $A.3$ in) [8], we sample a vector $v\sim\mathsf{Cauchy}(0,1)$ , which can be discretized to $\mathcal{O}(\log n)$ -bit precision following [13]. We then compute the inner products $\{v\cdot a\}_{a\in A}$ and $\{v\cdot b\}_{b\in B}$ . The distribution of $v\cdot a-v\cdot b$ follows $\mathsf{Cauchy}(0,\lVert a-b\rVert_{1})$ by the $1$ -stability property of Cauchy’s. So we have that for every $a\in A$ and $b\in B$ ,

\frac{\lVert a-b\rVert_{1}}{{\rm poly}(n)}\leq|v\cdot a-v\cdot b|\leq\lVert a-% b\rVert_{1}\cdot{\rm poly}(n),

with probability $1-1/{\rm poly}(n)$ . Therefore, $\mathsf{est}:=\mathsf{CH}(\{v\cdot a\}_{a\in A},\{v\cdot b\}_{b\in B})$ is a ${\rm poly}(n)$ -approximation to $\mathsf{CH}(A,B)$ . We may assume by scaling that $\{v\cdot a\}_{a\in A},\{v\cdot b\}_{b\in B}$ contain $w$ -bit integers, which can be sorted in $\mathcal{O}(n\log\log n)$ time [4]. Then to compute $\mathsf{est}$ , we find all one-dimensional nearest neighbors by going through the sorted list and link each $a^{\prime}\in\{v\cdot a\}_{a\in A}$ with adjacent $b^{\prime}\in\{v\cdot b\}_{b\in B}$ , which takes $\mathcal{O}(n)$ time. Thus the total runtime is $\mathcal{O}(nd+n\log\log n)$ as claimed. $\hfill\blacktriangleleft$

Appendix B Proof of Lemma 11

Proof.

We use the fact that for $v\sim(\mathsf{Cauchy}(0,1))^{d}$ and any $x\in\mathbb{R}^{d}$ , $(v\cdot x)\sim\mathsf{Cauchy}(0,\lVert x\rVert_{1})$ . Also, for any $k>0$ , if a random variable $z\sim\mathsf{Cauchy}(0,1)$ then $kz\sim\mathsf{Cauchy}(0,k)$ . Therefore, for any $v_{i}:i\in[r]$ , $\text{\bf Pr}\bigl{[}|v_{i}\cdot(x-y)|>(1+c)\lVert x-y\rVert_{1}\bigr{]}=\text% {\bf Pr}\bigl{[}U>1+c\bigr{]}$ where $U\sim\mathsf{HalfCauchy}(0,1)$ . The density of $U$ is $f_{U}(u)=\frac{2}{\pi}\cdot\frac{1}{1+u^{2}}$ , thus $\text{\bf Pr}\bigl{[}U>1\bigr{]}=1/2$ and

$\displaystyle\text{\bf Pr}\bigl{[}U>1+c\bigr{]}$	$\displaystyle=\frac{1}{2}-\int^{1+c}_{1}f_{U}(u)du$
	$\displaystyle\leq\frac{1}{2}-c\cdot f_{U}(3/2)$	for $0<c<1/2$
	$\displaystyle<\frac{1}{2}-c/10$

Similarly, we can get $\text{\bf Pr}\bigl{[}|v_{i}\cdot(x-y)|<(1-c)\lVert x-y\rVert_{1}\bigr{]}<\frac% {1}{2}-c/10$ . For $i\in[r]$ , let $\mathcal{I}_{i}$ be an indicator variable that equals $1$ if $|v_{i}\cdot(x-y)|<(1-c)\lVert x-y\rVert_{1}$ and equals $0$ otherwise. By Hoeffding’s bound,

\text{\bf Pr}\bigl{[}\sum_{i\in[r]}\mathcal{I}_{i}\geq\frac{r}{2}\bigr{]}<e^{-% 2rc^{2}/100},

which upper bounds the failure probability that the median is too small. We symmetrically bound the probability that the median is too large. Then

\text{\bf Pr}\bigl{[}\mathsf{med}\{|v_{i}\cdot(x-y)|:i\in[r]\}\in(1\pm c)% \lVert x-y\rVert_{1}\bigr{]}\geq{1-2e^{-rc^{2}/50}}.\

$\hfill\blacktriangleleft$

[bib.bib1] [1] Pdal: Chamfer. https://pdal.io/en/2.4.3/apps/chamfer.html, 2023. Accessed: 2023-05-12.

[bib.bib2] [2] Pytorch3d: Loss functions. https://pytorch3d.readthedocs.io/en/latest/modules/loss.html, 2023. Accessed: 2023-05-12.

[bib.bib3] [3] Tensorflow graphics: Chamfer distance. https://www.tensorflow.org/graphics/api_docs/python/tfg/nn/loss/chamfer_distance/evaluate, 2023. Accessed: 2023-05-12.

[bib.bib4] [4] Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, pages 427–436, New York, NY, USA, 1995. Association for Computing Machinery. doi:10.1145/225058.225173.

[bib.bib5] [5] Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 793–801, 2015. doi:10.1145/2746539.2746553.

[bib.bib6] [6] Kubilay Atasu and Thomas Mittelholzer. Linear-complexity data-parallel earth mover’s distance approximations. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 364–373. PMLR, 09–15 June 2019. URL: https://proceedings.mlr.press/v97/atasu19a.html.

[bib.bib7] [7] Vassilis Athitsos and Stan Sclaroff. Estimating 3d hand pose from a cluttered image. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volume 2, pages II–432. IEEE, 2003.

[bib.bib8] [8] Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, and Erik Waingarten. A near-linear time algorithm for the chamfer distance, 2023. doi:10.48550/arXiv.2307.03043.

[bib.bib9] [9] Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448–461, August 1973. doi:10.1016/S0022-0000(73)80033-9.

[bib.bib10] [10] Timothy M. Chan. Well-separated pair decomposition in linear time? Information Processing Letters, 107(5):138–141, 2008. doi:10.1016/j.ipl.2008.02.008.

[bib.bib11] [11] Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.

[bib.bib12] [12] Sariel Har-Peled. Geometric approximation algorithms. Number 173. American Mathematical Soc., 2011.

[bib.bib13] [13] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006. doi:10.1145/1147954.1147955.

[bib.bib14] [14] Li Jiang, Shaoshuai Shi, Xiaojuan Qi, and Jiaya Jia. Gal: Geometric adversarial loss for single-view 3d-object reconstruction. In Proceedings of the European conference on computer vision (ECCV), pages 802–816, 2018.

[bib.bib15] [15] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. doi:10.1145/3397271.3401075.

[bib.bib16] [16] Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pages 599–608, New York, NY, USA, 1997. Association for Computing Machinery. doi:10.1145/258533.258653.

[bib.bib17] [17] Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015. URL: http://proceedings.mlr.press/v37/kusnerb15.html.

[bib.bib18] [18] Erik B Sudderth, Michael I Mandel, William T Freeman, and Alan S Willsky. Visual hand tracking using nonparametric belief propagation. In 2004 Conference on Computer Vision and Pattern Recognition Workshop, pages 189–189. IEEE, 2004.

[bib.bib19] [19] Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, and Jing Liao. Transductive zero-shot learning with visual structure constraint. Advances in neural information processing systems, 32, 2019.

	$\displaystyle\text{\bf Pr}\bigl{[}\exists p\in P^{\prime}:\mathsf{med}_{p}\leq% \mathsf{med}_{p^{*}}\bigr{]}$	$\displaystyle\leq\text{\bf Pr}\bigl{[}\exists p\in P^{\prime}:\mathsf{med}_{p}% \neq(1\pm\frac{1}{4})\lVert q-p\rVert_{1}\bigr{]}+$
		$\displaystyle\hskip 13.00005pt\text{\bf Pr}\bigl{[}\mathsf{med}_{p^{}}\neq(1% \pm\frac{1}{4})\lVert q-p^{}\rVert_{1}\bigr{]}$
		$\displaystyle\leq(\|P^{\prime}\|+1)\cdot 2e^{-(2\log t+\log\log n)}$
		$\displaystyle=\frac{2(\|P^{\prime}\|+1)}{t^{2}\log n}.\$

Even Faster Algorithm for the Chamfer Distance

Abstract

Keywords and phrases:

Category:

Funding:

Copyright and License:

2012 ACM Subject Classification:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Our result.

Theorem 1.

1.1 Our techniques

Outline of the prior algorithm.

Intuitions behind the new algorithm.

Step 1.

Step 2.

2 Preliminaries

Notation.

3 Quadtree

Correctness.

⊳ Claim 2.

Proof.

⊳ Claim 3 (Lemma A.4 of [8]).

Lemma 4.

Proof.

Lemma 5.

Proof.

Theorem 6.

Proof.

Runtime analysis.

Lemma 7 (Line 2).

Proof.

Case 1: 𝒅≥𝒕.

⊳ Claim 8.

Case 2: 𝒕>𝒅.

⊳ Claim 9.

Theorem 10.

Proof.

4 Tournament

Notation.

Lemma 11.

Building the Data Structure.

Processing the Queries.

Correctness.

Lemma 12 (Case 1).

Proof.

Lemma 13 (Case 2).

Lemma 14.

Proof.

Lemma 15.

Proof.

Proof (of Lemma 13)..

Theorem 16.

Theorem 17.

Proof.

5 Rejection Sampling

Notation.

Lemma 18.

Proof.

Condition 19.

Condition 20.

⊳ Claim 21 (Line 2).

Proof.

⊳ Claim 22 (Line 3).

Proof.

Analysis of 𝑺.

Lemma 23.

Proof.

Lemma 24.

Proof.

Theorem 25.

Proof.

Theorem 26.

Proof.

References

Appendix A Reducing the Bit Precision of Inputs

⊳ Claim 27 (Lemma A⁢.3 of [8]).

$\vartriangleright$ Claim 2.

$\vartriangleright$ Claim 3 (Lemma A.4 of [8]).

Case 1: $d\geq t$ .

$\vartriangleright$ Claim 8.

Case 2: $t>d$ .

$\vartriangleright$ Claim 9.

$\vartriangleright$ Claim 21 (Line 2).

$\vartriangleright$ Claim 22 (Line 3).

Analysis of $𝑺$ .

$\vartriangleright$ Claim 27 (Lemma $A.3$ of [8]).