Abstract 1 Introduction 2 Upper bounds on 𝒇𝒅(𝒏,𝒎) 3 Lower bounds on 𝒇𝒅(𝒏,𝒎) 4 The Data Structure 5 Future Work References

Tight Bounds on the Number of Closest Pairs in Vertical Slabs

Ahmad Biniaz School of Computer Science, University of Windsor, Canada Prosenjit Bose ORCID School of Computer Science, Carleton University, Ottawa, Canada Chaeyoon Chung Department of Computer Science and
Engineering, Pohang University of Science
and Technology, Republic of Korea
Jean-Lou De Carufel ORCID School of Electrical Engineering and
Computer Science, University of Ottawa, Canada
John Iacono ORCID Department of Computer Science, Université libre de Bruxelles, Brussels, Belgium Anil Maheshwari ORCID School of Computer Science, Carleton University, Ottawa, Canada Saeed Odak School of Electrical Engineering and
Computer Science, University of Ottawa, Canada
Michiel Smid School of Computer Science, Carleton University, Ottawa, Canada Csaba D. Tóth ORCID Department of Mathematics, California State
University Northridge, Los Angeles, CA, USA
Department of Computer Science, Tufts University, Medford, MA, USA
Abstract

Let S be a set of n points in d, where d2 is a constant, and let H1,H2,,Hm+1 be a sequence of vertical hyperplanes that are sorted by their first coordinates, such that exactly n/m points of S are between any two successive hyperplanes. Let |A(S,m)| be the number of different closest pairs in the (m+12) vertical slabs that are bounded by Hi and Hj, over all 1i<jm+1. We prove tight bounds for the largest possible value of |A(S,m)|, over all point sets of size n, and for all values of 1mn.

As a result of these bounds, we obtain, for any constant ε>0, a data structure of size O(n), such that for any vertical query slab Q, the closest pair in the set QS can be reported in O(n1/2+ε) time. Prior to this work, no linear space data structure with sublinear query time was known.

Keywords and phrases:
closest pair, vertical slab, data structure
Funding:
Ahmad Biniaz: Research supported by NSERC.
Prosenjit Bose: Research supported by NSERC.
Chaeyoon Chung: Research supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT).
Jean-Lou De Carufel: Research supported by NSERC.
John Iacono: This work was supported by the Fonds de la Recherche Scientifique-FNRS.
Anil Maheshwari: Research supported by NSERC.
Saeed Odak: Research supported by NSERC.
Michiel Smid: Research supported by NSERC.
Csaba D. Tóth: Research supported in part by the NSF award DMS-2154347.
Copyright and License:
[Uncaptioned image] © Ahmad Biniaz, Prosenjit Bose, Chaeyoon Chung, Jean-Lou De Carufel, John Iacono,
Anil Maheshwari, Saeed Odak, Michiel Smid, and Csaba D. Tóth; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Communication complexity
; Mathematics of computing Combinatorial algorithms
Related Version:
Full Version: https://arxiv.org/abs/2502.17600
Acknowledgements:
Research on this paper was conducted, in part, at the Eleventh Annual Workshop on Geometry and Graphs, held at the Bellairs Research Institute in Barbados, March 8–15, 2024. The authors are grateful to the organizers and to the participants of this workshop.
Editors:
Pat Morin and Eunjin Oh

1 Introduction

Throughout this paper, we consider point sets in d, where the dimension d is an integer constant. For any real number a, we define the vertical hyperplane Ha to be the set

Ha={(x1,x2,,xd)d:x1=a}.

Note that this is a hyperplane with normal vector (1,0,0,,0). For any two real numbers a and b with a<b, we define the vertical slab Ha,Hb to be the set

Ha,Hb={(x1,x2,,xd)d:ax1b}.

Let S be a set of n points in d, in which no two points have the same first coordinate and all (n2) pairwise Euclidean distances are distinct.

For any two real numbers a and b with a<b, we define 𝐶𝑃(S,Ha,Hb) to be the closest-pair among all points in the set Ha,HbS, i.e., all points of S that are in the vertical slab Ha,Hb. If Ha,HbS has size at most one, then 𝐶𝑃(S,Ha,Hb)=.

Clearly, there are Θ(n2) combinatorially different111The slabs Ha,Hb and Ha,Hb are combinatorially different if their intersections with S are different. sets of the form Ha,HbS. Sharathkumar and Gupta [5] have shown that, for d=2, the size of the set

{𝐶𝑃(S,Ha,Hb):a<b}

is only O(nlogn). That is, even though there are Θ(n2) combinatorially different vertical slabs with respect to S, the number of different closest pairs in these slabs is only O(nlogn).

In this paper, we generalize this result to the case when the dimension d can be any constant and the slabs Ha,Hb come from a restricted set.

Let m be an integer with 1mn, and let a1<a2<<am+1 be real numbers such that for each i=1,2,,m, there are exactly222In order to avoid floors and ceilings, we assume for simplicity that n is a multiple of m. n/m points of S in the interior of the vertical slab Hai,Hai+1. Observe that this implies that all points in S are in the interior of the vertical slab Ha1,Ham+1.

We define

A(S,m)={𝐶𝑃(S,Hai,Haj):1i<jm+1}.

That is, |A(S,m)| is the number of different closest pairs over all (m+12) slabs bounded by vertical hyperplanes whose first coordinates belong to {a1,a2,,am+1}. Finally, we define

fd(n,m)=max{|A(S,m)|:|S|=n}.

Using this notation, Sharathkumar and Gupta [5] have shown that f2(n,n)=O(nlogn).

In dimension d=1, it is easy to see that f1(n,m)=Θ(m). Our main results are the following tight bounds on fd(n,m), for any constant d2 and any m with 1mn:

Theorem 1.

Let d2 be a constant, and let m and n be integers such that 1mn.

  1. 1.

    If m=O(n), then fd(n,m)=Θ(m2).

  2. 2.

    If m=ω(n), then fd(n,m)=Θ(nlog(m2/n)).

  3. 3.

    In particular, if m=n, then fd(n,m)=Θ(nlogn).

1.1 Motivation

In the range closest pair problem, we have to store a given set S of n points in d in a data structure such that queries of the following type can be answered: Given a query range R in d, report the closest pair among all points in the set RS.

Many results are known for different classes of query ranges. We mention some of the currently best data structures. Xue et al. [9] present data structures for the case when d=2 and the query ranges are quadrants, halfplanes, or axes-parallel rectangles. Again for the case when d=2, data structures for query regions that are translates of a fixed shape are given by Xue et al. [8]. Some results in any constant dimension d3 are given by Chan et al. [3]. Xue [7] considers colored point sets, where the goal is to report the closest pair of points with different colors that are inside a query range. For constant dimension d2, [7] presents data structures for different types of query regions that report (1+ε)-approximations for the closest pair with different colors. References to many other data structures can be found in [3, 8, 9].

Most of the currently known data structures use super-linear space. To the best of our knowledge, linear-sized data structures are known only for the following classes of regions, all in dimension d=2: Quadrants and halfplanes [9], and translates of a fixed polygon (possibly with holes) [8]. In all these three cases, the query time is O(logn).

If each query range R is a vertical slab Ha,Hb, we refer to the problem as the vertical slab closest pair problem. In dimension d=1, it is easy to obtain a data structure of size O(n) such that the closest pair in any “vertical slab” (i.e., interval on the real line) can be computed in O(logn) time. In dimension d=2, Sharathkumar and Gupta [5] gave a data structure of size O(nlog2n) that allows queries to be answered in O(logn) time. Xue et al. [9] improved the space bound to O(nlogn), while keeping a query time of O(logn). Both these results use the fact that f2(n,n)=O(nlogn). In fact, both data structures explicitly store the set {𝐶𝑃(S,Ha,Hb):a<b}, whose size is equal to f2(n,n) in the worst case.

The starting point of our work was to design a data structure of size O(n) for vertical slab closest pair queries. This led us to the problem of determining the asymptotic value of the function fd(n,m). Using our bounds in Theorem 1, we will obtain the following result.

Theorem 2.

Let d2 be an integer constant and let ε>0 be a real constant. For every set S of n points in d, there exists a data structure of size O(n) that allows vertical slab closest pair queries to be answered in O(n1/2+ε) time.

Note that, prior to our work, no O(n)-space data structure with a query time of o(n) was known for d2.

Organization.

In Section 2, we will present the upper bounds in Theorem 1 on fd(n,m). The corresponding lower bounds will be given in Section 3. The data structure in Theorem 2 will be presented in Section 4. We conclude in Section 5 with some open problems.

Notation and Terminology.

Throughout the rest of this paper, the notions of left and right in d will always refer to the ordering in the first coordinate. That is, if p=(p1,p2,,pd) and q=(q1,q2,,qd) are two points in d with p1<q1, then we say that p is to the left of q, and q is to the right of p. For a vertical hyperplane Ha, we say that p is to the left of Ha if p1<a. If p1>a, then p is to the right of Ha.

The Euclidean distance between any two points p and q in d will be denoted by pq. The length, or norm, of any vector v will be denoted by v.

2 Upper bounds on 𝒇𝒅(𝒏,𝒎)

Let d2 be a constant, let m and n be integers with 1mn, and let S be a set of n points in d. Let a1<a2<<am+1 be real numbers such that for each i=1,2,,m, there are exactly n/m points in S between the vertical hyperplanes Hai and Hai+1.

For any m, it is clear that fd(n,m)=O(m2), because there are (m+12) vertical slabs of the form Hai,Haj. Thus, the upper bound in Theorem 1 holds when m=O(n). In the rest of this section, we assume that m=ω(n).

The following lemma was proved by Sharathkumar and Gupta [5] for the case when d=2. This lemma will be the key tool to prove our upper bound on fd(n,m).

Lemma 3.

Let S be a set of n points in d and let 𝒞𝒫 be the set of segments corresponding to the elements of A(S,n). That is, for each pair in A(S,n), the set 𝒞𝒫 contains the line segment connecting the two points in this pair. For any vertical hyperplane H, the number of elements of 𝒞𝒫 that cross H is O(n).

We first present a proof of this lemma for the case when d=2. We believe that our proof is simpler than the one in [5]. Afterwards, we present a proof for any dimension d2.

Proof of Lemma 3 when 𝒅=𝟐.

We write for the vertical line. We define a graph, G+, with vertex set S. Each segment of 𝒞𝒫 with a positive slope represents an edge in the graph G+. Let F be the subgraph of G+ induced by the segments of 𝒞𝒫 that cross . We will show that F does not contain a cycle.

Suppose, to the contrary, that there is a cycle C in F. Let a and b be the endpoints of the shortest edge in C such that a is to the left of and b is to the right of . Let ac and bd be the other edges of the cycle that are incident to a and b, respectively. Since both ab and ac represent pairs in 𝒞𝒫 and both have a positive slope, we have ax<cx<bx and ay<by<cy. Similarly, we have ax<dx<bx and dy<ay<by; see Figure 1.

Figure 1: The pairs in A(S,n) with positive slope that cross do not contain a cycle.

Let c be the reflection of the point c with respect to the horizontal line through b. Note that bc=bc>bd, because bd represents a pair in A(S,n) and the vertical slab bx,dx contains the point c. Since bc>bd, we have cy<dy. We also have dy<ay and ax<dx<cx=cx. It follows that ad<ac.

Consider the bisector of the segment cc (which is the horizontal line through b). Observe that the point a is located on the same side as c with respect to this bisector. Therefore, ac<ac. Combined with ad<ac, this implies that ad<ac. This contradicts the facts that ac represents a pair in A(S,n) and the point d is in the slab ax,cx.

A similar argument shows that the segments in 𝒞𝒫 that cross and have negative slopes do not contain a cycle. Therefore, the total number of segments in 𝒞𝒫 that cross the line is O(n).

To prove Lemma 3 for dimensions d2, we will use the Well-Separated Pair Decomposition (WSPD), as introduced by Callahan and Kosaraju [2]. Let S be a set of n points in d and let s>1 be a real number, called the separation ratio. A WSPD for S is a set of pairs {Ai,Bi}, for i=1,2,,k, for some positive integer k, such that

  1. 1.

    for each i, AiS and BiS,

  2. 2.

    for each i, there exist two balls D and D of the same radius, say ρ, such that AiD, BiD, and the distance between D and D is at least sρ, i.e., the distance between their centers is at least (s+2)ρ,

  3. 3.

    for any two distinct points p and q in S, there is a unique index i such that pAi and qBi or vice-versa.

Consider a pair {Ai,Bi} in a WSPD. If p and p are two points in Ai and q is a point in Bi, then it is easy to see that

pp(2/s)pq. (1)
Lemma 4 (Callahan and Kosaraju [2]).

Let S be a set of n points in d, and let s>1 be a real number. A well-separated pair decomposition for S, with separation ratio s, consisting of O(sdn) pairs, can be computed in O(nlogn+sdn) time.

Proof of Lemma 3.

Let s>2 be a constant and consider a WSPD {Ai,Bi}, i=1,2,,k, for the point set S with separation ratio s, where k=O(n); see Lemma 4. We define the following geometric graph G on the point set S. For each i with 1ik, let

  • ai be the rightmost point in Ai that is to the left of H,

  • bi be the leftmost point in Bi that is to the right of H,

  • ai be the leftmost point in Ai that is to the right of H, and

  • bi be the rightmost point in Bi that is to the left of H.

We add the edges aibi and aibi to the graph G. Note that some of these points may not exist, in which case we ignore the corresponding edge. It is clear that G has O(n) edges. The lemma will follow from the fact that every segment in 𝒞𝒫 that crosses H is an edge in G.

Let pq be a pair in 𝒞𝒫 that crosses H, and let Q be a vertical slab such that pq is the closest pair in QS. We may assume, without loss of generality, that p is to the left of H and q is to the right of H. Let i be the index such that (i) pAi and qBi or (ii) pBi and qAi. We may assume, without loss of generality, that (i) holds.

We claim that p=ai. To prove this, suppose that pai. Then, since p is to the left of ai, ai is in the slab Q. Since s>2, Equation (1) yields pai<pq, which is a contradiction. By a symmetric argument, we have q=bi. Thus, pq is an edge in G.

Lemma 3 gives us a divide-and-conquer approach to prove an upper bound on fd(n,m):

Theorem 5.

Let d2 be a constant, and let m and n be integers with m=ω(n) and mn. Then fd(n,m)=O(nlog(m2/n)).

Proof.

Let S be a set of n points in d for which fd(n,m)=|A(S,m)|. Let a1<a2<<am+1 be real numbers such that for each i=1,2,,m, there are exactly n/m points in S that are strictly inside the vertical slab Hai,Hai+1.

Let H=Ha1+m/2. Observe that n/2 points of S are to the left of H and n/2 points of S are to the right of H. Denote these two subsets by S and S+, respectively. Each pair in A(S,m) is either a pair in A(S,m/2) or a pair in A(S+,m/2) or it crosses H. Using Lemma 3, it follows that

fd(n,m)=|A(S,m)|=|A(S,m/2)|+|A(S+,m/2)|+O(n)2fd(n/2,m/2)+O(n).

If we apply this recurrence k times, we get

fd(n,m)2kfd(n/2k,m/2k)+O(kn).

For k=log(m2/n), we have n/2k=n2/m2 and m/2k=n/m. Thus,

fd(n,m)m2nfd(n2/m2,n/m)+O(nlog(m2/n)).

Since fd(n2/m2,n/m)=O(n2/m2), we conclude that

fd(n,m)=O(n+nlog(m2/n))=O(nlog(m2/n)).

3 Lower bounds on 𝒇𝒅(𝒏,𝒎)

In this section, we prove the lower bounds on fd(n,m) as stated in Theorem 1. We will prove these lower bounds for the case when d=2. It is clear that this will imply the same lower bound for any constant dimension d2.

Theorem 6.

Let n and m be positive integers with nm(m+1). Then f2(n,m)=(m+12).

Proof.

It is clear that f2(n,m)(m+12). To prove the lower bound, we will construct a set S of n points in 2 such that the (m+12) vertical slabs have distinct closest pairs.

For i=1,2,,m+1, we take ai=i and consider the corresponding hyperplane Hi. Let 𝒬={Hi,Hj:1i<jm+1} be the set of all possible vertical slabs. We define the size of a slab Hi,Hj to be the difference ji of their indices.

We start by constructing a set P of m(m+1) points such that the slabs in 𝒬 contain distinct closest pairs in P, and for each i=1,2,,m, the slab Hi,Hi+1 contains exactly m+1 points of P.

Note that the slab H1,Hm+1 has the largest size. Let p be an arbitrary point in H1,H2 and let q be an arbitrary point in Hm,Hm+1. We initialize P={p,q}, D=pq, and delete the slab H1,Hm+1 from 𝒬.

As long as 𝒬 is non-empty, we do the following:

  • Take a slab Hi,Hj of largest size in 𝒬.

  • Let p be an arbitrary point in Hi,Hi+1 such that p is above the bounding box of P, and the distance between p and any point in P is more than D+2.

  • Let q be an arbitrary point in Hj1,Hj such that q is above the bounding box of P, the distance between q and any point in P is more than D+2, and pq=D+1.

  • Add p and q to P.

  • Set D=pq.

  • Delete the slab Hi,Hj from 𝒬.

It is not difficult to see that the final point set P has the properties stated above.

To obtain the final point set S, of size n, we define a set P of nm(m+1) points, such that each point in P has distance more than D to all points of P, the closest pair distance in P is more than D, and for each i=1,2,,m, the slab Hi,Hi+1 contains n/m(m1) points of P. The point set S=PP has the property that |A(S,m)|=(m+12).

Corollary 7.

Let n and m be sufficiently large positive integers with n<m(m+1) and m3n. Then f2(n,m)=Ω(m2).

Proof.

For i=1,2,,m+1, we take ai=i and consider the corresponding hyperplane Hi.

Let m=n/4 and n=m(m+1). We apply Theorem 6, where we replace n by n and m by m. This gives us a set S of n points with |A(S,m)|=f2(n,m). The points of S are between the hyperplanes H1 and Hm+1; for each i=1,2,,m, the vertical slab Hi,Hi+1 contains n/m points of S. Note that

|A(S,m)|=(m+12)=Ω((m)2).

Let D be the diameter of S. Let S be the union of S and a set of nn additional points that have pairwise distances more than D, whose distances to the points in S are more than D, and such that for each i=1,2,,m, the vertical slab Hi,Hi+1 contains n/m points of S. It is clear that

f2(n,m)|A(S,m)|=Ω((m)2).

Note that this construction is possible, because (i) n<n, (ii) m<m, and (iii) n/m<n/m; these inequalities follow by straightforward algebraic manipulations, using the assumptions on n and m in the statement of the corollary. Finally, these assumptions imply that mm/12. We conclude that f2(n,m)=Ω(m2).

Before we prove the lower bound for the remaining case, i.e., m>3n, we consider the case when m=n, which will serve as a warm up.

Theorem 8.

We have f2(n,n)=Ω(nlogn).

Proof.

We assume for simplicity that n is a sufficiently large power of two. We will construct a point set S of size n for which |A(S,n)|=Ω(nlogn).

Let k=logn. For i=0,1,,k1, let xi=2i and let vi=(xi,yi) be a vector, where the value of yi is inductively defined as follows: We set yk1=0. Assuming that yk1,yk2,,yi+1 have been defined, we set yi to an integer such that

vi>2j=i+1k1vj. (2)

We define

S={i=0k1βivi:(β0,β1,,βk1){0,1}k}.

Note that each binary sequence of length k represents a unique point in S. Using this representation, each point of S corresponds to a vertex of a k-dimensional hypercube Qk. We will prove below that each edge of Qk corresponds to a closest pair in a unique vertical slab. Since Qk has k2k1=Ω(nlogn) edges, this will complete the proof.

Consider an arbitrary edge of Qk. The two vertices of this edge are binary sequences of length k that have Hamming distance one, i.e., they differ in exactly one bit. Let t be the position at which they differ. Observe that 0tk1. Let r and s be the points of S that correspond to the two vertices of this edge. Then vt is either rs or sr. We will prove that r and s form the closest pair in the vertical slab Hr1,Hs1, where r1 and s1 are the first coordinates of r and s, respectively (assuming that r1<s1). Note that r1 and s1 are integers.

Let p and q be two points in Hr1,Hs1S such that {p,q}{r,s}. We have to show that rs<pq. Since p and q are points in S, we can write them as

p=i=0k1βp,ivi and q=i=0k1βq,ivi.

Let be the smallest index for which βp,βq,. Then

pq=i=k1(βp,iβq,i)vi.

By the triangle inequality, we have a+ba+b and a+bab for any two vectors a and b. These two inequalities, together with (2), imply that

pqvi=+1k1vi>12v.

Thus, it suffices to show that v2rs=2vt. If we can show that <t, then, using (2),

v>2i=+1k1vi2vt

and the proof is complete.

The horizontal distance between p and q (i.e., |p1q1|) is smaller than the horizontal distance between r and s. Recall that vt is either rs or sr. Thus, the horizontal distance between r and s is equal to the first coordinate of the vector vt, which is xt.

Let be the largest index for which βp,βq,. Note that . If =, then

pq=(βp,βq,)v

and

|p1q1|=|βp,βq,|x=x.

Now assume that <. We may assume, without loss of generality, that βp,=1 and βq,=0. Then (recall that xi=2i),

|p1q1| = |x+i=1(βp,iβq,i)xi|=|2+i=1(βp,iβq,i)2i|2i=12i=2=x.

We conclude that x|p1q1|<xt, which is equivalent to <t.

Theorem 9.

Let n and m be positive integers with 3n<mn. Then f2(n,m)=Ω(nlog(m2/n)).

Proof.

Our approach will be to use multiple scaled and shifted copies of the construction in Theorem 8 to define a set S of n points in 2 for which |A(S,m)|=Ω(nlog(m2/n)).

For i=1,2,,m+1, we take ai=i and consider the corresponding vertical hyperplane Hi. For each i=1,2,,m, the point set S will contain exactly n/m points in the vertical slab Hi,Hi+1.

Throughout the proof, we will use the following notation. Let v0,v1,,vk1 be a sequence of pairwise distinct vectors in the plane. The hypercube-set defined by these vectors is the point set

Q(v0,v1,,vk1)={i=0k1βivi:(β0,β1,,βk1){0,1}k}.

For each g=1,2,,n/m, we define a hypercube-set Qg:

  • Let kg=log(m/(2g1)).

  • For each i=0,1,,kg1, let xg,i=(2g1)2i and let vg,i=(xg,i,yg,i) be a vector, whose second coordinate yg,i will be defined later.

  • Let Qg=Q(vg,0,vg,1,,vg,kg1).

Since, for integers g, g, i, and i, (2g1)2i=(2g1)2i if and only if g=g and i=i, then all values xg,i are pairwise distinct.

To define the values yg,i, we sort all vectors vg,i by their first coordinates. We go through the sorted sequence in decreasing order:

  • For the vector with the largest first coordinate, we set its y-value to zero.

  • For each subsequent vector vg,i, we set yg,i to be an integer such that

    vg,i>2g,ivg,i, (3)

    where the summation is over all pairs g,i for which yg,i has already been defined (i.e., xg,i<xg,i).

We choose pairwise distinct real numbers 0<εg<1, for g=1,2,,n/m, and set

Δ=1+max{diam(Qg):1gn/m},

where diam(Qg) denotes the diameter of the point set Qg.

For each g=1,2,,n/m and i=1,2,,2g1, let

Sg,i=Qg+(i+εg,2((g1)2+i1)Δ),

that is, Sg,i is the translate of Qg by the vector (i+εg,2((g1)2+i1)Δ). We define S to be the union of all these sets Sg,i, i.e.,

S=g=1n/mi=12g1Sg,i.

Note that the sets Sg,i are pairwise disjoint: Indeed if gg or ii, then the y-projections of Sg,i and Sg,i (i.e., the sets of second coordinates) are disjoint by construction. Consequently, the size of the union of these point sets satisfies

|S|=g=1n/mi=12g1|Sg,i|=g=1n/mi=12g12kgg=1n/m(2g1)m2g1=n.

For each 1gn/m, by construction of Qg and the fact that the sets Sg,i are disjoint translations of Qg, each slab Hj,Hj+1 contains at most one point of i=12g1Sg,i. Therefore, each slab Hj,Hj+1 contains at most n/m points of S.

To obtain the final point set S of size n, we add n|S| points to S such that each slab Hj,Hj+1 contains exactly n/m points of S, and the added points are sufficiently far from each other and from all points of S.

In the rest of this proof, we will prove the following claim: For each g=1,2,,n/m, consider two binary strings of length kg that differ in exactly one position (recall that the number of such pairs of strings is equal to kg2kg1). These strings correspond to two points of the hypercube-set Qg. Thus, for any i=1,2,,2g1, they correspond to two points, say r and s, in the set Sg,i. We claim that r and s form the closest pair in the set Hr1,Hs1S, where r1 and s1 are the first coordinates of r and s, respectively (assuming that r1<s1). Note that we take the floor and the ceiling, because r1 and s1 are not integers. This claim will imply that

f2(n,m)|A(S,m)|g=1n/mi=12g1kg2kg1.

Since kg>log(m2g1)1, we get

f2(n,m) g=1n/m(2g1)(log(m2g1)1)m4(2g1)
= g=1n/mm4log(m2g1)g=1n/mm4.

Since each term in the first summation is larger than the last term, which is larger than (m/4)log(m2/(2n)), we get

f2(n,m) nmm4log(m2/(2n))n4=n4(log(m2/(2n))1).

Since m>3n,

log(m2/(2n))1=Ω(log(m2/n)).

We conclude that

f2(n,m)=Ω(nlog(m2/n)).

It remains to prove the above claim. Let g and i be integers with 1gn/m and 1i2g1. Consider two binary strings of length kg that differ in exactly one position; denote this position by t. Let r and s be the two corresponding points of Sg,i. Note that the vector vg,t is equal to either rs or sr.

We may assume, without loss of generality, that r1<s1. To prove that r and s form the closest pair in the set Hr1,Hs1S, we consider an arbitrary pair p and q of points in Hr1,Hs1S such that {p,q}{r,s}. We will show that rs<pq.

Let g, g′′, i, and i′′ be such that pSg,i and qSg′′,i′′. If gg′′ or ii′′, then

pq|p2q2|Δ>diam(Qg)rs.

In the rest of the proof, we assume that g=g′′ and i=i′′. Since both p and q are in Sg,i, we can write them as

p = j=0kg1βp,jvg,j+(i+εg,2((g1)2+i1)Δ)and
q = j=0kg1βq,jvg,j+(i+εg,2((g1)2+i1)Δ).

Let be the smallest index such that 0kg1 and βp,βq,. Using the triangle inequality and (3), we have

pq=j=kg1(βp,jβq,j)vg,jvg,j=+1kg1vg,j>12vg,.

Thus, it suffices to show that

vg,2rs=2vg,t.

Since both p and q are in Sg,i, we have |p1q1||r1s1| (where r and s are each at distance εg from the left boundary of their corresponding slabs, and p and q are at distance εg from the left boundary of their corresponding slabs). Let be the largest index such that 0kg1 and βp,βq,. We have

xg,t = |r1s1||p1q1|=|j=(βp,jβq,j)xg,j|xg,j=1xg,j
= (2g1)(2j=12j)=(2g1)2=xg,.

Since {p,q}{r,s}, then xg,t cannot be equal to xg,. Therefore, xg,t>xg,. Using (3), it then follows that vg,2vg,t.

4 The Data Structure

In this section, we will present a data structure for vertical closest pair queries. Our data structure will use the results in the following three lemmas.

Lemma 10.

Let S be a set of n points in d and let L be a set of k line segments such that the endpoints of each segment belongs to S. There exists a data structure of size O(n+k), such that for any two real numbers a and b with a<b, the shortest segment in L that is completely inside the vertical slab Ha,Hb can be reported in O(logn) time.

Proof.

Xue et al. [9, Section 3] proved the claim in the case where d=2. A careful analysis of their construction shows that the claim in fact holds for any constant dimension d2.

The next lemma is due to Mehlhorn [4, page 44]; see Smid [6] for a complete analysis of this data structure.

Lemma 11.

Let S be a set of n points in d and let ε>0 be a real constant. There exists a data structure of size O(n), such that for any axis-parallel rectangular box B, all points in BS can be reported in O(nε+|BS|) time.

The last tool that we need is a standard sparsity property.

Lemma 12.

Let r>0 be a real number, and let X be a set of points in d that are contained in an r×2r×2r××2r rectangular box B. If the distance of the closest pair of points in X is at least r, then |X|2d1cd, where c=1+d.

In the rest of this section, we will prove the following result.

Theorem 13.

Let S be a set of n points in d, let m be an integer with 1mn, and let ε>0 be a real constant. There exists a data structure of size O(n+fd(n,m)) such that for any two real numbers a and b with a<b, the closest pair in the vertical slab Ha,Hb can be reported in O(n1+ε/m) time.

Proof.

Let a1<a2<<am+1 be real numbers such that for each i=1,2,,m, the vertical slab Hai,Hai+1 contains n/m points of S. Let k=|A(S,m)|. Note that kfd(n,m). Our data structure consists of the following components:

  • An array storing the numbers a1,a2,,am+1. For each i=1,2,,m, the i-th entry stores, besides the number ai, a list of all points in Hai,Hai+1S.

  • The data structure of Lemma 10, where L is the set of line segments corresponding to the pairs in {𝐶𝑃(S,Hai,Haj):1i<jm+1}.

  • The data structure of Lemma 11.

The size of the entire data structure is O(m+n+k), which is O(n+fd(n,m)).

We next describe the query algorithm. Let a and b be real numbers with a<b. Using binary search, we compute, in O(logm)=O(n1+ε/m) time, the indices i and j such that Ha is in the slab Hai1,Hai and Hb is in the slab Haj,Haj+1.

If i=j, then the slab Ha,Hb contains O(n/m) points of S. In this case, we use the algorithm of Bentley and Shamos [1] to compute the closest pair in Ha,Hb in O((n/m)log(n/m))=O(n1+ε/m) time.

Figure 2: (A), (B), and (C) are the three regions created by a query Ha,Hb. The rectangle Rp is the range query for the point p with respect to the query Ha,Hb.

Assume that i<j. The two hyperplanes Hai and Haj split the query slab Ha,Hb into three parts (A), (B), and (C), where (A) is the slab Ha,Hai, (B) is the slab Hai,Haj, and (C) is the slab Haj,Hb; refer to Figure 2.

Let SAC be the set of points in S that are in the union of (A) and (C), and let SB be the set of points in S that are in (B). There are three possibilities for the closest pair in Ha,Hb: Both endpoints are in SAC, both endpoints are in SB, or one endpoint is in SAC and the other endpoint is in SB.

Using the algorithm of Bentley and Shamos [1], we compute the closest pair distance δ1 in SAC, in O((n/m)log(n/m))=O(n1+ε/m) time. Using the data structure of Lemma 10, we compute the closest pair distance δ2 in SB in O(logn)=O(n1+ε/m) time.

Let δ=min(δ1,δ2). In the final part of the query algorithm, we use the data structure of Lemma 11:

  • For each point p in the region (A), we compute the set of all points in S that are in the part, say Pp, of the axes-parallel box

    [p1,p1+δ]×[p2δ,p2+δ]××[pdδ,pd+δ]

    that is to the left of Haj. Then we compute δp, which is the minimum distance between p and any point inside SPp.

  • For each point p in the region (C), we compute the set of all points in S that are in the part, say Pp, of the axes-parallel box

    [p1δ,p1]×[p2δ,p2+δ]××[pdδ,pd+δ]

    that is to the right of Hai. Then we compute δp, which is the minimum distance between p and any point inside SPp.

  • At the end , we return the minimum of δ and min{δp:pSAC}.

By Lemma 12, the boxes Pp and Pp each contain O(1) points of S. In total, there are O(n/m) queries to the data structure of Lemma 11, and each one takes O(nε) time. Thus, this final part of the query algorithm takes O((n/m)nε)=O(n1+ε/m) time.

The proof of Theorem 2 follows by taking m=n in Theorem 13 and using Theorem 1.

5 Future Work

The point sets that we constructed for the lower bounds on fd(n,m) have coordinates that are at least exponential in the number of points. Recall that the spread (also known as aspect ratio) of a point set is the ratio of the diameter and the closest pair distance. It is well-known that the spread of any set of n points in d is Ω(n11/d). It is natural to define fd(n,m,Φ) as the quantity analogous to fd(n,m), where we only consider sets of n points in d having spread at most Φ.

Problem 14.

Determine the value of fd(n,m,Φ).

For any set S of n points in d, where d=2, Xue et al. [9] have presented a data structure of size O(nlogn) that can be used to answer vertical slab closest pair queries in O(logn) time. Our data structure uses only O(n) space and works in any constant dimension d2. However, its query time is O(n1/2+ε).

Problem 15.

Is there a linear space data structure that supports vertical slab closest pair queries in o(n) time, or even in O(polylog(n)) time?

Another interesting research direction is to design linear space data structures for closest pair queries with other types of ranges, such as axes-parallel 3-sided ranges and general axes-parallel rectangular ranges.

References

  • [1] Jon Louis Bentley and Michael I. Shamos. Divide-and-conquer in multidimensional space. In Proceedings of the 8th ACM Symposium on the Theory of Computing, pages 220–230, 1976. doi:10.1145/358841.358850.
  • [2] Paul B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1):67–90, 1995. doi:10.1145/200836.200853.
  • [3] Timothy M. Chan, Saladi Rahul, and Jie Xue. Range closest-pair search in higher dimensions. Comput. Geom., 91:101669, 2020. doi:10.1016/J.COMGEO.2020.101669.
  • [4] Kurt Mehlhorn. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. Monographs in Theoretical Computer Science. Springer-Verlag, Berlin, 1984. doi:10.1007/978-3-642-69900-9.
  • [5] R. Sharathkumar and Prosenjit Gupta. Range-aggregate proximity queries. Technical Report IIIT/TR/2007/80, International Institute of Information Technology Hyderabad, 2007.
  • [6] Michiel Smid. Range trees with slack parameter. Algorithms Review, 2:77–87, 1991. URL: https://pure.mpg.de/rest/items/item_1834999_3/component/file_2463090/content.
  • [7] Jie Xue. Colored range closest-pair problem under general distance functions. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 373–390, 2019. doi:10.1137/1.9781611975482.24.
  • [8] Jie Xue, Yuan Li, Saladi Rahul, and Ravi Janardan. Searching for the closest-pair in a query translate. J. Comput. Geom., 11(2):26–61, 2020. doi:10.20382/JOCG.V11I2A3.
  • [9] Jie Xue, Yuan Li, Saladi Rahul, and Ravi Janardan. New bounds for range closest-pair problems. Discret. Comput. Geom., 68(1):1–49, 2022. doi:10.1007/S00454-022-00388-7.