New bounds for range closest-pair problems

Given a dataset $S$ of points in $\mathbb{R}^2$, the range closest-pair (RCP) problem aims to preprocess $S$ into a data structure such that when a query range $X$ is specified, the closest-pair in $S \cap X$ can be reported efficiently. The RCP problem can be viewed as a range-search version of the classical closest-pair problem, and finds applications in many areas. Due to its non-decomposability, the RCP problem is much more challenging than many traditional range-search problems. This paper revisits the RCP problem, and proposes new data structures for various query types including quadrants, strips, rectangles, and halfplanes. Both worst-case and average-case analyses (in the sense that the data points are drawn uniformly and independently from the unit square) are applied to these new data structures, which result in new bounds for the RCP problem. Some of the new bounds significantly improve the previous results, while the others are entirely new.


Introduction
The closest-pair problem is one of the most fundamental problems in computational geometry and finds many applications, e.g., collision detection, similarity search, traffic control, etc. In this paper, we study a range-search version of the closest-pair problem called the range closest-pair (RCP) problem. Let X be a certain collection of ranges called query space. The RCP problem with query space X (or the X -RCP problem for short) aims to preprocess a given dataset S of points into a low-space data structure such that when a query range X ∈ X is specified, the closest-pair in S ∩ X can be reported efficiently. The motivation for the RCP problem is clear and similar to that of range search: in many situations, one is interested in local information (i.e., local closest-pairs) inside specified ranges rather than global information (i.e., global closest-pair) of the dataset.
The RCP problem is quite challenging due to a couple of reasons. First, in the RCP problem, the objects of interest are in fact point-pairs instead of single points, and in a dataset there is a quadratic number of point-pairs to be dealt with. Moreover, the RCP problem is non-decomposable in the sense that even if the query range X ∈ X can be written as X = X 1 ∪ X 2 , the closest-pair in S ∩ X cannot be computed from the closest-pairs in S ∩ X 1 and S ∩ X 2 . The non-decomposability makes many traditional range-search techniques inapplicable to the RCP problem, and thus makes the problem much more challenging.
The RCP problem in R 2 has been studied in some prior work in the last decades or so, e.g., [1,5,6,8,9]. In this paper, we revisit this problem and make significant improvements to the existing solutions. Following the existing work, the query types considered in this paper are orthogonal queries (specifically, quadrants, strips, rectangles) and halfplane query.

Our contributions, techniques, and related work
The closest-pair problem and range search are both classical topics in computational geometry; see [2,10] for references. The RCP problem is relatively new. The best existing bounds in R 2 and our new results are summarized in Table 1 (Space refers to space cost and Qtime refers to query time), and we give a brief explanation below.

Query
Source  Table 1: Summary of the best existing bounds and our new results for the RCP problem in R 2 (each row corresponds to an RCP data structure for the corresponding query space).
• Related work. The RCP problem for orthogonal queries was studied in [6,8,9]. The best known solution for quadrant query was given by [6], while [9] gave the best known solution for strip query. For rectangle query, there are two best known solutions (in terms of worst-case bounds) given by [6] and [9] respectively. The above results only considered worst-case performance of the data structures. The authors of [6] for the first time applied average-case analysis to RCP data structures in the model where the data points are drawn independently and uniformly from the unit square. Unfortunately, [6] only gave a rectangle RCP data structure with low average-case preprocessing time, while its average-case space cost and query time are even higher than the worst-case counterparts of the data structure given by [9] (even worse, its worst-case space cost is super-quadratic). In fact, in terms of space cost and query time, no nontrivial average-case bounds were known for any kind of query before this paper. The RCP problem for halfplane query was studied in [1]. Two data structures were proposed. We only present the first one in Table 1. The second one (not in the table), while having higher space cost and query time than the first one, can be built in O(n log 2 n) time. Both data structures require (worst-case) super-linear space cost and polynomial query time.
• Our contributions. In this paper, we improve all the above results by giving new RCP data structures for various query types. The improvements can be seen in Table 1. In terms of worst-case bounds, the highlights are our rectangle RCP data structure which simultaneously improves the two best known results (given by [6] and [9]) and our halfplane RCP data structure which is optimal and significantly improves the bounds in [1]. Furthermore, by applying average-case analysis to our new data structures, we establish the first nontrivial average-case bounds for all the query types studied. Our average-case analysis applies to datasets generated in not only the unit square but also an arbitrary axes-parallel rectangle. These averagecase bounds demonstrate that our new data structures might have much better performance in practice than one can expect from the worst-case bounds. Finally, we also give an O(n log 2 n)-time algorithm to build our halfplane RCP data structure, matching the preprocessing time in [1]. The preprocessing for our orthogonal RCP data structures is not considered in this paper; we are still in progress of investigating this problem.
• Our techniques. An important notion in our techniques is that of a candidate pair, i.e., a pair of data points that is the answer to some RCP query. Our solutions for the quadrant and strip RCP problems use the candidate pairs to construct a planar subdivision and take advantage of point-location techniques to answer queries. The data structures themselves are simple, and our main technical contribution here occurs in the average-case analysis of the data structures. The analysis requires a nontrivial study on the expected number of candidate pairs in a random dataset, which is of both geometric and combinatorial interest. Our data structure for the rectangle RCP problem is subtle; it is constructed by properly combining two simpler data structures, each of which partially achieves the desired bounds. The high-level framework of the two simpler data structures is identical: it first "decomposes" a rectangle query into four quadrant queries and then simplifies the problem via some geometric observations similar to those in the standard divide-and-conquer algorithm for the classical closest-pair problem. Also, the analysis of the data structures is technically interesting. Our solution for the halfplane RCP problem applies the duality technique to map the candidate pairs to wedges in the dual space and form a planar subdivision, which allows us to solve the problem by using point-location techniques on the subdivision, similarly to the approach for the quadrant and strip RCP problems. However, unlike the quadrant and strip cases, to bound the complexity of the subdivision here is much more challenging, which requires non-obvious observations made by properly using the properties of duality and the problem itself. The average-case bounds of the data structure follow from a technical result bounding the expected number of candidate pairs, which also involves a nontrivial proof.
• Organization. Section 1.2 presents the notations and preliminaries that are used throughout the paper. We suggest the reader to read this section carefully before moving on. Our solutions for quadrant, strip, rectangle, and halfplane queries are presented in Section 2, 3, 4, and 5, respectively. To make the paper more readable, some technical proofs are deferred to Appendix A.

Notations and Preliminaries
We introduce the notations and preliminaries that are used throughout the paper.
• Query spaces. The following notations denote various query spaces (i.e., collections of ranges in R 2 ): Q quadrants, P strips, U 3-sided rectangles, R rectangles, H halfplanes (quadrants, strips, 3-sided rectangles, rectangles under consideration are all axes-parallel). Define Q = {[x, ∞) × [y, ∞) : x, y ∈ R} ⊆ Q as the sub-collection of all northeast quadrants, and define Q , Q , Q similarly. Define P v = {[x 1 , x 2 ] × R : x 1 , x 2 ∈ R} ⊆ P as the sub-collection of all vertical strips, and similarly P h horizontal strips. If l is a vertical (resp., horizontal) line, an l-anchored strip is a vertical (resp., horizontal) strip containing l; define P l ⊆ P as the sub-collection of all l-anchored strips. Define U ↓ = {[x 1 , x 2 ] × (−∞, y] : x 1 , x 2 , y ∈ R} ⊆ U as the sub-collection of all bottom-unbounded rectangles, and define U ↑ , U ← , U → similarly. If l is a non-vertical line, denote by l ↑ (resp., l ↓ ) the halfplane above (resp., below) l; define H ↑ ⊆ H (resp., H ↓ ⊆ H) as the sub-collection of all such halfplanes.
• Candidate pairs. For a dataset S and query space X , a candidate pair of S with respect to X refers to a pair of points in S which is the closest-pair in S ∩ X for some X ∈ X . We denote by Φ(S, X ) the set of the candidate pairs of S with respect to X . If l is a line, we define Φ l (S, X ) ⊆ Φ(S, X ) as the subset consisting of the candidate pairs that cross l (i.e., whose two points are on opposite sides of l).
• Data structures. For a data structure D, we denote by D(S) the data structure instance of D built on the dataset S. The notations Space(D(S)) and Qtime(D(S)) denote the space cost and query time (i.e., the maximum time for answering a query) of D(S), respectively.
• Random datasets. If X is a region in R 2 (or more generally in R d ), we write S ∝ X n to mean that S is a dataset of n random points drawn independently from the uniform distribution Uni(X) on X. More generally, if X 1 , . . . , X n are regions in R 2 (or more generally in R d ), we write S ∝ n i=1 X i to mean that S is a dataset of n random points drawn independently from Uni(X 1 ), . . . , Uni(X n ) respectively.
• Other notions. For a point a ∈ R 2 , we denote by a.x and a.y the x-coordinate and y-coordinate of a, respectively. For two points a, b ∈ R d , we use dist(a, b) to denote the Euclidean distance between a and b, and use [a, b] to denote the segments connecting a and b (in R 1 this coincides with the notation for a closed interval). We say I 1 , . . . , I n are aligned vertical (resp., horizontal) segments in R 2 if there exist r 1 , . . . , r n , α, β ∈ R such that I i = {r i } × [α, β] (resp., I i = [α, β] × {r i }). The length of a pair φ = (a, b) of points is the length of the segment [a, b]. For S ⊆ R 2 of size at least 2, the notation κ(S) denotes the closest-pair distance of S, i.e., the length of the closest-pair in S.
The following result regarding the closest-pair distance of a random dataset will be used to bound the expected number of candidate pairs with respect to various query spaces.
In particular, if R is a segment of length , then E[κ p (A)] = Θ(( /m 2 ) p ). Figure 1: The subdivision induced by successively overlaying the quadrants.

Quadrant query
We consider the RCP problem for quadrant queries, i.e., the Q-RCP problem. In order to solve the Q-RCP problem, it suffices to consider the Q -RCP problem. Let S ⊆ R 2 be a dataset of size n. Suppose Φ(S, Q ) = {φ 1 , . . . , φ m } where φ i = (a i , b i ), and assume φ 1 , . . . , φ m are sorted in increasing order of their lengths. It is known in [6] As such, the closest-pair in S ∩ Q to be reported is φ η for η = min{i : q ∈ W i }. We create a planar subdivision Γ , by successively overlaying W 1 , . . . , W m (see Figure 1). Note that the complexity of Γ is O(m), since overlaying each quadrant creates at most two vertices of Γ . By the above observation, the answer for Q is φ i iff q is in the cell W i \ i−1 j=1 W j . Thus, we can use the optimal planar point-location data structures (e.g., [4,7]) to solve the problem in O(m) space with O(log m) query time. Since m = O(n), we obtain a Q-RCP data structure using O(n) space with O(log n) query time in worst-case.
Next, we analyze the average-case performance of the above data structure. In fact, it suffices to bound the expected number of the candidate pairs. Surprisingly, we have the following poly-logarithmic bound.
Lemma 2 For a random dataset S ∝ R n where R is an axes-parallel rectangle, E[|Φ(S, Q)|] = O(log 2 n).
Using the above lemma, we can immediately conclude that our data structure uses O(log 2 n) space in averagecase. The average-case query time is in fact Theorem 3 There exists a Q-RCP data structure A such that • For any S ⊆ R 2 of size n, Space(A(S)) = O(n) and Qtime(A(S)) = O(log n).
• For a random S ∝ R n where R is the unit square or more generally an arbitrary axes-parallel rectangle, E[Space(A(S))] = O(log 2 n) and E[Qtime(A(S))] = O(log log n).

Strip query
We consider the RCP problem for strip queries, i.e., the P-RCP problem. In order to solve the P-RCP problem, it suffices to consider the P v -RCP problem. Let S ⊆ R 2 be a dataset of size n. Suppose Φ(S, Q ) = {φ 1 , . . . , φ m } where φ i = (a i , b i ), and assume φ 1 , . . . , φ m are sorted in increasing order of their lengths. It is known in [9] that m = O(n log n). We construct a mapping Φ(S, , and observe that for a query range P = [x 1 , . As such, the closest-pair in S ∩ P is φ η for η = min{i : p ∈ W i }. Thus, as in Section 2, we can successively overlay W 1 , . . . , W m to create a planar subdivision, and use point-location to solve the problem in O(m) space and O(log m) query time. Since m = O(n log n) here, we obtain a P-RCP data structure using O(n log n) space with O(log n) query time in worst-case.
Next, we analyze the average-case performance of our data structure. Again, it suffices to bound the expected number of the candidate pairs. For later use, here we study a more general case in which the candidate pairs are considered with respect to 3-sided rectangle queries. Recall that I 1 , . . . , I n are aligned vertical (resp., horizontal) segments in R 2 if there exist r 1 , . . . , r n , α, β ∈ R such that I i = {r i } × [α, β] (resp., . . , I n are distinct aligned vertical (resp., horizontal ) segments sorted from left to right (resp., from bottom to top). Suppose a i ∈ S is the point drawn on I i . Then for i, j ∈ {1, . . . , n} with i < j and X ∈ {U ↓ , U ↑ } (resp., X ∈ {U ← , U → }), From the above lemma, a direct calculation gives us the following corollary.
We plug in the bound Pr[(a i , a j ) ∈ Φ(S, U ↓ )] = O(log(j − i)/(j − i) 2 ) shown in Lemma 4 to the above equation. Noting the fact that ∞ t=1 log t/t 2 = O(1), a direct calculation then gives us E[|Φ(S, U ↓ )|] = O(n). Now assume S ∝ R n . Define a random multi-set X = {a.x : a ∈ S}, which consists of the x-coordinates of the n random points in S. We shall show that for all x 1 , . . . , x n ∈ [0, 1] such that x 1 < · · · < x n , . . , n}, then I 1 , . . . , I n are aligned vertical segments sorted from left to right. Note that, under the condition X = {x 1 , . . . , x n }, the n random points in S can be viewed as independently drawn from the uniform distributions on I 1 , . . . , I n , respectively. Thus, Equation 1 follows directly from our previous argument for the case S ∝ n i=1 I i . As a result, E[|Φ(S, U ↓ )|] = O(n). Using the above argument and our previous data structure, we conclude the following.
Theorem 6 There exists a P-RCP data structure B such that • For any S ⊆ R 2 of size n, Space(B(S)) = O(n log n) and Qtime(B(S)) = O(log n).
• For a random S ∝ R n where R is the unit square or more generally an arbitrary axes-parallel rectangle, E[Space(B(S))] = O(n) and E[Qtime(B(S))] = O(log n).

Rectangle query
We consider the RCP problem for rectangle queries, i.e., the R-RCP problem. Interestingly, our final solution for the R-RCP problem is a combination of two simpler solutions, each of which partially achieves the desired bounds.
We first describe the common part of our two solutions. Let S ⊆ R 2 be a dataset of size n. The common component of our two data structures is a standard 2D range tree built on S [3]. The main tree (or primary tree) T is a range tree built on the x-coordinates of the points in S. Each node u ∈ T corresponds to a subset S(u) of x-consecutive points in S, called the canonical subset of u. At u, there is an associated secondary tree T u , which is a range tree built on the y-coordinates of the points in S(u). With an abuse of notation, for each node v ∈ T u , we still use S(v) to denote the canonical subset of v, which is a subset of y-consecutive points in S(u). As in [6], for each (non-leaf) primary node u ∈ T , we fix a vertical line l u such that the points in the canonical subset of the left (resp., right) child of u is to the left (resp., right) of l u . Similarly, for each (non-leaf) secondary node v, we fix a horizontal line l v such that the points in the canonical subset of the left (resp., right) child of v is above (resp., below) l v . Let v ∈ T u be a secondary node. Then at v we have two lines l v and l u , which partition R 2 into four quadrants. We denote by S 1 (v), . . . , S 4 (v) the subsets of S(v) contained in these quadrants; see Figure 2a for the correspondence. In order to solve the problem, (a) Illustrating the subsets S1(v), . . . , S4(v).  we need to store some additional data structures at the nodes of the tree (called sub-structures). At each secondary node v, we store four Q-RCP data structures A(S 1 (v)), . . . , A(S 4 (v)) (Theorem 3). Now let us explain what we can do by using this 2D range tree (with the sub-structures). Let R = [x 1 , x 2 ] × [y 1 , y 2 ] ∈ R be a query rectangle. We first find in T the splitting node u ∈ T corresponding to the range [x 1 , x 2 ], which is by definition the LCA of all the leaves whose corresponding points are in [x 1 , x 2 ] × R. Then we find in T u the splitting node v ∈ T u corresponding to the range [y 1 , y 2 ]. If either of the splitting nodes does not exist or is a leaf node, then |S ∩ R| ≤ 1 and nothing should be reported. So assume u and v are non-leaf nodes. By the property of splitting node, we have S ∩ R = S(v) ∩ R, and the lines l u and l v both intersect R. Thus, l u and l v decompose R into four smaller rectangles R 1 , . . . , R 4 ; see Figure 2b for the correspondence. By construction, we have S(v) ∩ R i = S i (v) ∩ R i . In order to find the closest-pair in S ∩ R, we first try to compute the closest-pair in S ∩ R i for all i ∈ {1, . . . , 4}. This can be done by querying the sub-structures stored at v. Indeed, where Q i is the quadrant obtained by removing the two sides of R i that coincide with l u and l v . Therefore, we can query A(S i (v)) with Q i to find the closest-pair in S ∩ R i . Once the four closest-pairs are computed, we take the shortest one (i.e., the one of the smallest length) among them and denote it by φ.
Clearly, φ is not necessarily the closest-pair in S ∩ R as the two points in the closest-pair may belong to different R i 's. However, as we will see, with φ in hand, finding the closest-pair in S ∩ R becomes easier. Suppose l u : x = α and l v : y = β, where x 1 ≤ α ≤ x 2 and y 1 ≤ β ≤ y 2 . Let δ be the length of φ. We define P α = [α − δ, α + δ] × R (resp., P β = R × [β − δ, β + δ]) and R α = R ∩ P α (resp., R β = R ∩ P β ); see Figure 3. We have the following key observation.
show that φ * ∈ {φ, φ α , φ β }. If φ * = φ, we are done. So assume φ * = φ. Then a * and b * must be contained in different R i 's. It follows that the segment [a * , b * ] intersects either l u or l v . Note that the length of φ * is at most δ (recall that δ is the length of φ), which implies |a * .x − b * .x| ≤ δ and |a * .
Due to the above lemma, it now suffices to compute φ α and φ β . Note that R α and R β are rectangles, so computing φ α and φ β still requires rectangle RCP queries. Fortunately, there are some additional properties which make it easy to search for the closest-pairs in S ∩ R α and S ∩ R β . For a set A of points in R 2 and a, b ∈ A, we define the x-gap (resp., y-gap) between a and b in A as the number of the points in A\{a, b} whose x-coordinates (resp., y-coordinates) are in between a.x and b.x (resp., a.y and b.y).
Lemma 8 There exists a constant integer k such that the y-gap (resp., x-gap) between the two points of φ α (resp., φ β ) in S ∩ R α (resp., S ∩ R β ) is at most k.
Proof. We only need to consider φ α . Let k = 100. Suppose φ α = (a, b). We denote by w the left-right width of R α , i.e., the distance between the left and right boundaries of R α . By the construction of R α , we have w ≤ 2δ. We consider two cases: |a.y − b.y| ≥ 2w and |a.y − b.y| < 2w. Suppose |a.y − b.y| ≥ 2w. Assume there are more than k points in (S ∩ R α )\{a, b} whose y-coordinates are in between a.y and b.y. Then we can find, among these points, two points a and b such that |a .y − b .y| ≤ |a.
which contradicts the fact that φ α is the closest-pair in S ∩ R α . Next, suppose |a.y − b.y| < 2w. Then |a.y − b.y| < 4δ. Consider the rectangle R * = R α ∩ (R × [a.y, b.y]). Note that (S ∩ R * )\{a, b} consists of exactly the points in (S ∩ R α )\{a, b} whose y-coordinates are in between a.y and b.y. Therefore, it suffices to show |S ∩ R * | ≤ k. Let R * i = R * ∩ R i for i ∈ {1, . . . , 4}. Since R * i ⊆ R i , the pairwise distances of the points in S ∩ R * i are at least δ. Furthermore, the left-right width of each R * i is at most δ and the top-bottom width of each R * i is at most |a.y − b.y| (which is smaller than 4δ). Therefore, a simple argument using Pigeonhole principle shows that |S ∩ R * i | ≤ 16 < k/4. As such, |S ∩ R * | ≤ k. We shall properly use the above lemma to help compute φ α and φ β . At this point, our two solutions diverge.

Preliminary: Extreme point data structures
Before presenting our solutions, we introduce the so-called top/bottom extreme point (TBEP) and left/right extreme point (LREP) data structures. For a query space X and a constant integer k, an (X , k)-TBEP (resp. (X , k)-LREP) data structure stores a given set S of points in R 2 and can report the k topmost/bottommost (resp., leftmost/rightmost) points in S ∩ X for a query range X ∈ X .
Lemma 9 Let k be a constant integer. There exists a (P v , k)-TBEP data structure K v such that for any S ⊆ R 2 of size n, Space(K v (S)) = O(n) and Qtime(K v (S)) = O(log n). Symmetrically, there also exists a (P h , k)-LREP data structure K h satisfying the same bounds.
Proof. Let S ⊆ R 2 be a dataset of size n. The (P v , k)-TBEP data structure instance K v (S) is a standard 1D range tree T built on the x-coordinates of the points in S. By the construction of a range tree, each node u ∈ T corresponds to a subset S(u) of x-consecutive points in S, called the canonical subset of u. The leaves of T one-to-one correspond to the points in S. At each node u ∈ T , we store the k topmost and k bottommost points in S(u); we denote the set of these 2k points by K(u). The overall space cost of the range tree (with the stored points) is clearly O(n), as k is a constant. To answer a query P = [x 1 , x 2 ] × R ∈ P v , we first find the t = O(log n) canonical nodes u 1 , . . . , u t ∈ T corresponding to the range [x 1 , x 2 ]. This is a standard range-tree operation, which can be done in O(log n) time. We compute K = t i=1 K(u i ) in O(log n) time. We then use selection to find the k topmost and k bottommost points in K; this can be done in O(log n) time since |K| = 2kt = O(log n). These 2k points are just the k topmost and k bottommost points in S ∩ P . The (P h , k)-LREP data structure K h is constructed in a symmetric way.
Lemma 10 Let l be a vertical (resp., horizontal ) line and k be a constant integer. There exists a (P l , k)-TBEP (resp., (P l , k)-LREP ) data structure K l such that for S ∝ n i=1 I i where I 1 , . . . , I n are distinct aligned vertical (resp., horizontal ) segments, E[Space(K l (S))] = O(log n) and E[Qtime(K l (S))] = O(log log n).

First solution
We now introduce our first solution, which achieves the desired worst-case bounds. Let k be the constant integer in Lemma 8. In our first solution, besides the 2D range tree presented before, we build additionally two 1D range trees T and T on S, where T (resp., T ) is built on y-coordinates (resp., x-coordinates). For u ∈ T (resp., u ∈ T ), we still use S(u ) (resp., S(u )) to denote the canonical subset of u (resp., u ∈ T ). At each node u ∈ T , we store a P-RCP data structure B(S(u )) (Theorem 6) and a (P v , k)-TBEP data structure K v (S(u )) (Lemma 9). Similarly, at each node u ∈ T , we store a P-RCP data structure B(S(u )) (Theorem 6) and a (P h , k)-LREP data structure K h (S(u )) (Lemma 9).
We now explain how to compute φ α and φ β .
To compute φ α , we first find in T the t = O(log n) canonical nodes u 1 , . . . , u t ∈ T corresponding to the range [y α , y α ]. Then t i=1 S(u i ) = S ∩P y , and each S(u i ) is a set of y-consecutive points in S ∩ P y . Furthermore, S ∩ R α = t i=1 S(u i ) ∩ P x . We query the sub-structures B(S(u 1 )), . . . , B(S(u t )) with P x to find the closest-pairs φ 1 , . . . , φ t in S(v 1 ) ∩ P x , . . . , S(v t ) ∩ P x , respectively. We also query K v (S(u 1 )), . . . , K v (S(u t )) with P x to obtain the k topmost and bottommost points in S(u 1 )∩P, . . . , S(u t )∩P , respectively; we denote by K the set of the 2kt reported points. Then we find the closest-pair φ K in K using the standard divide-and-conquer algorithm. We claim that φ α is the shortest one among {φ 1 , . . . , φ t , φ K }. Suppose φ α = (a, b). If the two points of φ α are both contained in some S(u i ), then clearly φ α = φ i . Otherwise, by Lemma 8 and the choice of k, the two points of φ α must belong to K and hence φ α = φ K . It follows that φ α ∈ {φ 1 , . . . , φ t , φ K }. Furthermore, because the pairs φ 1 , . . . , φ t , φ K are all contained in R α , φ α must be the shortest one among {φ 1 , . . . , φ t , φ K }. Therefore, with φ 1 , . . . , φ t , φ K in hand, φ α can be easily computed. The pair φ β is computed symmetrically using T . Finally, taking the shortest one among {φ, φ α , φ β }, the query R can be answered.
The 2D range tree together with the two 1D range trees T and T forms an R-RCP data structure, which is our first solution. A straightforward analysis gives us the worst-case space cost and query time of this data structure.
Theorem 11 There exists an R-RCP data structure D 1 such that for any S ⊆ R 2 of size n, Space(D 1 (S)) = O(n log 2 n) and Qtime(D 1 (S)) = O(log 2 n).
Proof. We first analyze the space cost. Let v be a secondary node of the 2D range tree. By Theorem 3, the space cost of the sub-structures stored at v is O(|S(v)|). Therefore, for a primary node u ∈ T of the 2D range tree, the space cost of T u (with the sub-structures) is O(|S(u)| log |S(u)|). As a result, the entire space cost of the 2D range tree is O(n log 2 n). Let u ∈ T be a node of the 1D range tree T . By Theorem 6 and Lemma 9, the space cost of the sub-structures stored at u is O(|S(u )| log |S(u )|). As such, the entire space cost of T is O(n log 2 n). For the same reason, the space cost of T is O(n log 2 n), and hence the entire space cost of D 1 is O(n log 2 n).
Then we analyze the query time. When answering a query, we need to compute the pairs φ, φ α , φ β in Lemma 7. To compute φ, we first find the splitting nodes u ∈ T and v ∈ T u . This is done by a top-down walk in T and T u , which takes O(log n) time. Then we query the sub-structures A(S 1 (v)), . . . , A(S 4 (v)), which can be done in O(log n) time by Theorem 3. Thus, the time for computing φ is O(log n). To compute φ α , we first find the t = O(log n) canonical nodes u 1 , . . . , u t ∈ T , which can be done in O(log n) time. Then we query the sub-structures B(S(u 1 )), . . . , B(S(u t )) and K v (S(u 1 )), . . . , K v (S(u t )) to obtain the pairs φ 1 , . . . , φ t and the set K of 2kt points. By Theorem 6 and Lemma 9, this step can be done in O(log 2 n) time. Finally, we compute the closest-pair φ K in K using the standard divide-and-conquer algorithm, which takes O(log n log log n) time since |K| = O(log n). Thus, the time for computing φ α is O(log 2 n), so is the time for computing φ β . As a result, the overall query time is O(log 2 n).
Our first solution itself already achieves the desired worst-case bounds, which simultaneously improves the results given in [6] and [9].

Second solution
We now introduce our second solution, which has the desired average-case space cost and an O(log n) query time (even in worst-case). In our second solution, we only use the 2D range tree presented before, but we need some additional sub-structures stored at each secondary node. Let k be the constant integer in ) as the subset of S(v) consisting of the points above (resp., below) l v . Similarly, define S (v) and S (v) as the subsets to the left and right of l u , respectively. Let v ∈ T u be a secondary node. Besides A(S 1 (v)), . . . , A(S 4 (v)), we store at v two (P lu , k)-TBEP data structures K lu (S (v)), K lu (S (v)) (Lemma 10) and two (P lv , k)-LREP data structures K lv (S (v)), K lv (S (v)) (Lemma 10). Furthermore, we need a new kind of sub-structures called range shortest-segment (RSS) data structures. For a query space X , an X -RSS data structure stores a given set of segments in R 2 and can report the shortest segment contained in a query range X ∈ X . For the case X = U, we have the following RSS data structure.
Lemma 12 There exists a U-RSS data structure C such that for any set G of m segments, Space(C(G)) = O(m 2 ) and Qtime(C(G)) = O(log m).
Proof. It suffices to design the U ↓ -RSS data structure. We first notice the existence of a P v -RSS data structure using O(m) space with O(log m) query time. Indeed, by applying the method in Section 3, we immediately obtain this data structure (a segment here corresponds to a candidate pair in Section 3). With this P v -RSS data structure in hand, it is quite straightforward to design the desired U ↓ -RSS data structure. Let G = {σ 1 , . . . , σ m } be a set of m segments where σ i = [a i , b i ]. Define y i = max{a i .y, b i .y} and assume y 1 ≤ · · · ≤ y m . Now build a (balanced) binary search tree with keys y 1 , . . . , y m . We denote by u i the node corresponding to y i . At u i , we store a P v -RSS data structure described above built on the subset G i = {σ 1 , . . . , σ i } ⊆ G. The overall space cost is clearly O(m 2 ). To answer a query U = [x 1 , x 2 ] × (−∞, y] ∈ U ↓ , we first use the binary search tree to find the maximum y i that is less than or equal to y. This can be done Thus, we can find the desired segment by querying the P v -RSS data structure stored at u i with P , which takes O(log m) time.
as four sets of segments by identifying each pointpair (a, b) as a segment [a, b]. Then we apply Lemma 12 to build and store at v four U-RSS data structures We now explain how to compute φ α and φ β . Let us consider φ α . Recall that φ α is the closest-pair in S ∩ R α , i.e., in S(v) ∩ R α . Let P be the l u -anchored strip obtained by removing the top/bottom bounding line of R α . If the two points of φ α are on opposite sides of l v , then by Lemma 8 its two points must be among the k bottommost points in S (v) ∩ P and the k topmost points in S (v) ∩ P respectively. Using K lu (S (v)) and K lu (S (v)), we report these 2k points, and compute the closest-pair among them by brute-force. If the two points of φ α are on the same side of l v , then they are both contained in either S (v) or S (v). So it suffices to compute the closest-pairs in S (v) ∩ R α and S (v) ∩ R α . Without loss of generality, we only need to consider the closest-pair in S (v) ∩ R α . We denote by U the 3-sided rectangle obtained by removing the bottom boundary of R α , and by Q 1 (resp., Q 2 ) the quadrant obtained by removing the right (resp., left) boundary of U . We query A(S 1 (v)) with Q 1 , A(S 2 (v)) with Q 2 , and C(Φ (v)) with U . Clearly, the shortest one among the three answers is the closest-pair in S (v) ∩ R α . Indeed, the three answers are all point-pairs in S (v) ∩ R α . If the two points of the closest-pair in S (v) ∩ R α are both to the left (resp., right) of l u , A(S 1 (v)) (resp., A(S 2 (v))) reports it; otherwise, the closest-pair crosses l u , and C(Φ (v)) reports it. Now we see how to compute φ α , and φ β can be computed symmetrically. Finally, taking the shortest one among {φ, φ α , φ β }, the query R can be answered.
A straightforward analysis shows that the overall query time is O(log n) even in worst-case. The worstcase space cost is not near-linear, as the U-RSS data structure C may occupy quadratic space by Lemma 12. Interestingly, we can show that the average-case space cost is in fact O(n log n). The crucial thing is to bound the average-case space of the sub-structures stored at the secondary nodes. The intuition for bounding the average-case space of the Q-RCP and TBEP/LREP sub-structures comes directly from the average-case performance of our Q-RCP data structure (Theorem 3) and TBEP/LREP data structure (Lemma 10). However, to bound the average-case space of the U-RSS sub-structures is much more difficult. By our construction, the segments stored in these sub-structures are 3-sided candidate pairs that cross a line. As such, we have to study the expected number of such candidate pairs in a random dataset. To this end, we recall Lemma 4. Let l be a vertical line, and S ∝ n i=1 I i be a random dataset drawn from vertical aligned segments I 1 , . . . , I n as in Lemma 4. Suppose we build a U-RSS data structure , which can also be obtained using Lemma 4, but requires nontrivial work.
Lemma 13 Let l be a vertical (resp., horizontal ) line and S ∝ n i=1 I i where I 1 , . . . , I n are distinct aligned vertical (resp., horizontal ) segments. Then for Now we are ready to prove the bounds of our second solution.
Theorem 14 There exists an R-RCP data structure D 2 such that • For any S ⊆ R 2 of size n, Qtime(D 2 (S)) = O(log n).
• For a random S ∝ R n where R is the unit square or more generally an arbitrary axes-parallel rectangle, E[Space(D 2 (S))] = O(n log n).

Combining the two solutions
We now combine the two data structures D 1 (Theorem 11) and D 2 (Theorem 14) to obtain a single data structure D that achieves the desired worst-case and average-case bounds simultaneously. For a dataset S ⊆ R 2 of size n, if Space(D 2 (S)) ≥ n log 2 n, we set D(S) = D 1 (S), otherwise we set D(S) = D 2 (S). The worst-case bounds of D follows directly, while to see the average-case bounds of D requires a careful analysis using Markov's inequality.
Theorem 15 There exists an R-RCP data structure D such that • For any S ⊆ R 2 of size n, Space(D(S)) = O(n log 2 n) and Qtime(D(S)) = O(log 2 n).
• For a random S ∝ R n where R is the unit square or more generally an arbitrary axes-parallel rectangle, E[Space(D(S))] = O(n log n) and E[Qtime(D(S))] = O(log n).
Proof. As mentioned above, our data structure D is obtained by combining D 1 (Theorem 11) and D 2 (Theorem 14) as follows. For any S ⊆ R 2 of size n, if Space(D 2 (S)) ≥ n log 2 n, we set D(S) = D 1 (S), otherwise we set D(S) = D 2 (S). We claim that D satisfies the desired bounds. Let S ⊆ R 2 be a dataset of size n. It is clear from the construction that Space(D(S)) = O(n log 2 n). Also, Qtime(D(S)) = O(log 2 n), since Qtime(D 1 (S)) = O(log 2 n) and Qtime(D 2 (S)) = O(log n). To analyze the average-case performance of D, let S ∝ R n for an axes-parallel rectangle R. Define E as the event Space(D 2 (S)) ≥ n log 2 n and ¬E as the complement of E (¬E is the event Space(D 2 (S)) < n log 2 n). Since E[Space(D 2 (S))] = O(n log n), we have Pr[E] = O(1/ log n) by Markov's inequality. To bound the average-case space cost, we observe To bound the average-case query time, let T i be the worst-case query time of D i built on a dataset of size

Halfplane query
We consider the RCP problem for halfplane queries, i.e., the H-RCP problem. In order to solve the H-RCP problem, it suffices to consider the H ↑ -RCP problem. Let S ⊆ R 2 be the dataset of size n.
We shall apply the standard duality technique [3]. A non-vertical line l : To make the exposition cleaner, we distinguish between primal space and dual space, which are two copies of R 2 . The dataset S and query ranges are assumed to lie in the primal space, while their dual objects are assumed to lie in the dual space. Duality allows us to transform the H ↑ -RCP problem into a point location problem as follows. Let H = l ↑ ∈ H ↑ be a query range. The line l bounding H is dual to the point l * in the dual space; for convenience, we also call l * the dual point of H. If we decompose the dual space into "cells" such that the query ranges whose dual points lie in the same cell have the same answer, then point location techniques can be applied to solve the problem directly. Note that this decomposition must be a polygonal subdivision Γ of R 2 , which consists of vertices, straight-line edges, and polygonal faces (i.e., cells). This is because the cell-boundaries must be defined by the dual lines of the points in S. In order to analyze the space cost and query time, we need to study the complexity |Γ | of Γ . An O(n 2 ) trivial upper bound for |Γ | follows from the fact that the subdivision formed by the n dual lines of the points in S has an O(n 2 ) complexity. Surprisingly, using additional properties of the problem, we can show that |Γ | = O(n).
. . , φ m are sorted in increasing order of their lengths. It is shown in [1] that m = O(n), and the candidate pairs do not cross each other (when identified as segments), i.e., the segments [a i , b i ] and [a j , b j ] do not cross for any i = j. The non-crossing property of the candidate pairs is important and will be used later for proving Lemma 16. With this in hand, we now consider the subdivision Γ . Let H = l ↑ ∈ H ↑ be a query range. By the property of duality, , l * is in the upward-open wedge W i generated by the lines a * i and b * i (in the dual space); see Figure 4. As such, the closest-pair in S ∩ H to be reported is φ η for η = min{i : l * ∈ W i }. Therefore, Γ can be constructed by successively overlaying the wedges W 1 , . . . , W m (similarly to what we see in Section 2). Formally, we begin with a trivial subdivision Γ 0 of R 2 , which consists of only one face, the entire plane. Suppose Γ i−1 is constructed, which has an outer face F i−1 equal to the complement of i−1 j=1 W j in R 2 . Now we construct a new subdivision Γ i by "inserting" W i to Γ i−1 . Specifically, Γ i is obtained from Γ i−1 by decomposing the outer face F i−1 via the wedge W i ; that is, we decompose F i−1 into several smaller faces: one is F i−1 \W i and the others are the connected components which is connected (as one can easily verify) and becomes the outer face F i of Γ i . In this way, we construct Γ 1 , . . . , Γ m in order, and it is clear that Γ m = Γ . The linear upper bound for |Γ | follows from the following technical result.
An example for Case 3 where (l i ) * ∈ r and r∩Wj contains (l i ) * but does not contain r0 or the infinite end of r Proof. Let F i be the outer face of Γ i , and ∂W i be the boundary of the wedge W i (which consists of two rays emanating from the intersection point of a * i and b * i ). We first note that, to deduce that it suffices to show that the number of the connected components of ∂W i ∩ F i−1 is constant. This is because every connected component of ∂W i ∩ F i−1 contributes to Γ i exactly one new face, a constant number of new vertices, and a constant number of new edges. Indeed, we only need to check one branch of ∂W i (i.e., one of the two rays of ∂W i ), say the ray contained in a * i (we denote it by r). We will show that r ∩ F i−1 has O(1) connected components. Without loss of generality, we may assume that a i is to the left of b i . Then each point on r is dual to a line in the primal space, which goes through the point , and each r ∩ W j is a connected portion of r. We consider each j ∈ {1, . . . , i − 1} and analyze the intersection r ∩ W j . Let l i be the line through a i and b i . There are three cases to be considered separately: We claim that either r ∩ W j is empty or it contains the infinite end of r (i.e., the point at infinity along r). Imagine that we have a point p moving along r from r 0 to the infinite end of r. Then p is dual to a line in the primal space rotating clockwise around a i from the line l i to the vertical line through a i ; see Figure 5b. Note that p ∈ r ∩ W j (in the dual space) only when a j , b j ∈ (p * ) ↑ (in the primal space). But a j , b j ∈ l ↓ i in this case. When p is moving, the region l ↓ i ∩ (p * ) ↑ expands. As such, one can easily see that r ∩ W j must contain the infinite end of r if it is nonempty. (See Figure 5c and 5d.) If c is to the left of a i , we claim that r ∩ W j is empty. Observe that the dual line of any point on r is through a i and below b i , meaning that it must be above c (as c is to the left of a i ). In other words, the dual line of any point on r is above at least one of a j , b j , and thus any point on r is not contained in the wedge W j , i.e., r ∩ W j is empty. (See Figure 5e.) The subtlest case occurs when c is to the right of b i . In such a case, we consider the line through a i perpendicular to l i , which we denote by l i . We first argue that both a j and b j must be on the same side of l i as b i . Since c is to the right of b i , at least one of a j , b j is on the same side of l i as b i . However, we notice that [a j , b j ] cannot intersect l i , otherwise the length of φ j is (strictly) greater than that of φ i , contradicting the fact that j < i (recall that φ 1 , . . . , φ m is sorted in increasing order of their lengths). So the only possibility is that a j , b j , b i are on the same side of l i . Now we further have two sub-cases.
• l i has no dual point (i.e., l i is vertical) or its dual point (l i ) * is not on the ray r. In this case, consider a point p moving along r from r 0 to the infinite end of r. Clearly, when p moves, the region (l i ) → ∩ (p * ) ↑ expands. Thus, either r ∩ W j is empty or it contains the infinite end of r. (See Figure 5f and 5g.) • (l i ) * is on r. Then r ∩ W j may be a connected portion of r containing neither r 0 nor the infinite end of r. However, as b i ∈ (l i ) ↑ in this case, we have a j , b j ∈ (l i ) ↑ (recall that a j , b j , b i are on the same side of l i ). This implies that r ∩ W j contains (l i ) * . (See Figure 5h.) In sum, we conclude that for any j ∈ {1, . . . , i − 1}, the intersection r ∩ W j might be (i) empty, or (ii) a connected portion of r containing r 0 , or (iii) a connected portion of r containing the infinite end of r, or (iv) a connected portion of r containing (l i ) * (if (l i ) * is on r). As such, the union i−1 j=1 (r ∩ W j ) can have at most three connected components, among which one contains r 0 , one contains the infinite end of r, and one contains (l i ) * . Therefore, the complement of With the above result in hand, we can build an optimal point-location data structure for Γ using O(m) space with O(log m) query time to solve the RCP problem. Since m = O(n), we obtain an H-RCP data structure using O(n) space and O(log n) query time in worst-case.
Next, we analyze the average-case bounds of the above data structure. In fact, it suffices to bound the expected number of the candidate pairs. Surprisingly, we have the following poly-logarithmic bound.
Lemma 17 For a random dataset S ∝ R n where R is an axes-parallel rectangle, E[|Φ(S, H)|] = O(log 2 n).

Now we can conclude the following.
Theorem 18 There exists an H-RCP data structure E such that • For any S ⊆ R 2 of size n, Space(E(S)) = O(n) and Qtime(E(S)) = O(log n).
• For a random S ∝ R n where R is the unit square or more generally an arbitrary axes-parallel rectangle, E[Space(E(S))] = O(log 2 n) and E[Qtime(E(S))] = O(log log n).

Preprocessing
In this section, we show how to build the H-RCP data structure in Theorem 18 in O(n log 2 n) time. It suffices to consider the H ↑ -RCP data structure described in Section 5. To build this data structure, the key step is to construct the subdivision Γ of the dual R 2 (see Section 5). Since |Γ | = O(n), once Γ is constructed, one can build in O(n log n) time the point-location data structure for Γ , and hence our H ↑ -RCP data structure.
Let us first consider an easier task, in which Φ(S, H ↑ ) is already given beforehand. In this case, we show that Γ can be constructed in O(n log n) time. As in Section 5, suppose Φ(S, H ↑ ) = {φ 1 , . . . , φ m } where φ 1 , . . . , φ m are sorted in increasing order of their lengths. Recall that in Section 5 we defined the m subdivisions Γ 0 , . . . , Γ m . Our basic idea for constructing Γ is to begin with Γ 0 and iteratively construct Γ i from Γ i−1 by inserting the wedge W i dual to φ i . In this process, a crucial thing is to maintain the outer face F i (or its boundary). Note that the boundary ∂F i of F i (i.e., the upper envelope of F i ) is an x-monotone polygonal chain consisting of segments and two infinite rays; we call these kinds of chains left-right polylines and call their pieces fractions. Naturally, a binary search tree can be used to store a left-right polyline; the keys are its fractions in the left-right order. Therefore, we shall use a (balanced) BST T to maintain ∂F i . That is, at the end of the i-th iteration, we guarantee the left-right polyline stored in T is ∂F i . At each node of T , besides storing the corresponding fraction, we also store the wedge W j which contributes this fraction.
Suppose we are now at the beginning of the i-th iteration. We have Γ i−1 in hand and T stores ∂F i−1 . We need to "insert" the wedge W i to generate Γ i from Γ i−1 , and update T . To this end, the first step is to compute ∂W i ∩ F i−1 . Now let us assume in advance that ∂W i ∩ F i−1 is already computed in O(log |T |) time; later we will explain how to achieve this. With ∂W i ∩ F i−1 in hand, to construct Γ i is fairly easy. By the proof of Lemma 16, ∂W i ∩ F i−1 has O(1) connected components. We consider these components one-by-one. Let ξ be a component, which is an x-monotone polygonal chain with endpoints (if any) on ∂F i−1 (indeed, ξ consists of at most two pieces as it is a portion of ∂W i ). For convenience, assume ξ has a left endpoint u and a right endpoint v. Then ξ contributes a new (inner) face to Γ i , which is the region bounded by ξ and the portion σ of ∂F i−1 between u, v. We then use T to report all the fractions of ∂F i−1 intersecting σ in left-right order, using which the corresponding new face can be directly constructed. The time cost for reporting the fractions is O(log |T | + k), where k is the number of the reported fractions; see Appendix B for implementation details. After all the components are considered, we can construct Γ i by adding the new faces to Γ i−1 (and adjusting the involved edges/vertices if needed). As there are O(1) components, the total time cost for constructing Γ i from Γ i−1 is O(log |T | + K i ), where K i is the total number of the fractions reported from T . But we can charge the reported fractions to the corresponding new faces, and the fractions charged to each face are at most as many as its edges. Therefore, , and this part of the time cost is amortized O(log m) for each iteration. The remaining task is to update the left-right polyline T to ∂F i . In fact, the update can also be done in amortized O(log m) for each iteration (see Appendix B for implementation details). As such, the overall time cost for constructing Γ is O(m log m), and thus O(n log n).
We now explain the missing part of the above algorithm: computing ∂W i ∩ F i−1 in O(log |T |) time. Let r be the left ray of ∂W i and r 0 be the initial point of r (i.e., the vertex of W i ). It suffices to compute r ∩ F i−1 . Recall that l i is the line through a i , b i and l i is the line through a i perpendicular to l i . Assume (l i ) * ∈ r (the case that (l i ) * ∈ r is in fact easier). The point (l i ) * partitions r into a segment s = [r 0 , (l i ) * ] and a ray r emanating from (l i ) * , where r is to the left of s. By the proof of Lemma 16, each wedge W j for j ∈ {1, . . . , i − 1} with W j ∩ r = ∅ satisfies at least one of the following: (1) r 0 ∈ W j , (2) (l i ) * ∈ W j , (3) W j contains the infinite end of r. Therefore, r ∩ F i−1 can have one or two connected components; if it has two components, one should be contained in r and the other should be contained in s. As such, r contains at most one left endpoint and one right endpoint of (some component of) r ∩ F i−1 , so does s. We show that one can find these endpoints by searching in T . Suppose we want to find the left endpoint z contained in r (assume it truly exists). Let γ be a fraction of ∂F i−1 which is contributed by the wedge W j for j ∈ {1, . . . , i − 1}. It is easy to verify that γ contains z iff γ intersects r and W j contains the infinite end of r. Also, γ is to the left of z iff γ ⊆ R and W j contains the infinite end of r, where R is the region to the left of (l i ) * and above r . As such, one can simply search in T to find the fraction γ containing z in O(log |T |) time, if z truly exists. (If z does not exist, by searching in T we can verify its non-existence, as we can never find the desired fraction γ.) The right endpoint contained in r and the left/right endpoints contained in s can be computed in a similar fashion. With these endpoints in hand, one can compute r ∩F i−1 straightforwardly. The other case that (l i ) * ∈ r is handled similarly and more easily, as in this case r ∩ F i−1 has at most one connected component. Therefore, r ∩ F i−1 (and thus ∂W i ∩ F i−1 ) can be computed in O(log |T |) time.
Next, we consider how to construct Γ if we are only given the dataset S. It is shown in [1] that one can compute in O(n log 2 n) time a set Ψ of pairs of points in S such that Φ(S, H ↑ ) ⊆ Ψ and |Ψ | = O(n log n). We use that method to compute Ψ , and suppose Ψ = {ψ 1 , . . . , ψ M } where ψ 1 , . . . , ψ M are sorted in increasing order of their lengths. The m candidate pairs φ 1 , . . . , φ m ∈ Φ(S, H ↑ ) are among ψ 1 , . . . , ψ M . Let i 1 < · · · < i m be indices such that φ 1 = ψ i1 , . . . , φ m = ψ im (note that at this point we do not know what i 1 , . . . , i m are). We shall consider ψ 1 , . . . , ψ M in order. When considering ψ i , we want to verify whether ψ i is a candidate pair or not. If this can be done, the candidate pairs φ 1 , . . . , φ m will be found in order. Whenever a new candidate pair φ k is found, we construct Γ k from Γ k−1 in O(log m) time by the approach above. Now assume ψ 1 , . . . , ψ i−1 are already considered, the candidate pairs in {ψ 1 , . . . , ψ i−1 } are recognized (say they are φ 1 , . . . , φ k−1 ), and Γ k−1 is constructed. We then consider ψ i . We need to see whether ψ i is a candidate pair, i.e., whether ψ i = φ k . Let W be the corresponding wedge of ψ i in the dual R 2 . Observe that Conversely, if W k−1 j=1 W j , then their exists some halfplane H ∈ H ↑ such that H contains ψ i and does not contain φ 1 , . . . , φ k−1 . Then the closest-pair in S ∩ H cannot be in {ψ 1 , . . . , ψ i−1 } but must be in Ψ , hence it is nothing but ψ i . Based on this observation, we can verify whether ψ i = φ k as follows. We assume ψ i = φ k and try to use it to construct Γ k from Γ k−1 by our above approach. If our assumption is correct, then Γ k is successfully constructed in O(log m) time. Furthermore, in the process of constructing Γ k , our approach allows us to find a point in W \ k−1 j=1 W j , which we call witness point. This witness point then evidences the correctness of our assumption. On the other hand, if our assumption is wrong, the process can still terminate in O(log m) time, but we can never find such a witness point because W ⊆ k−1 j=1 W j . In this case, we just discard ψ i and continue to consider ψ i+1 . After considering all pairs in Ψ , we recognize all the m candidate pairs and Γ = Γ m is constructed. Since m = O(log n) and |Ψ | = O(n log n), the overall process takes O(n log 2 n) time.

Application to the H-RSS problem
Interestingly, our approach used to solve the H-RCP problem can also be applied to the H-RSS problem, and leads to an optimal H-RSS data structure for interior-disjoint (i.e., non-crossing) segments, which is of indepedent interest.
Theorem 19 There exists an H-RSS data structure F such that for any set G of n interior-disjoint (i.e., non-crossing) segments in R 2 , Space(F(G)) = O(n), Qtime(F(G)) = O(log n), and F(G) can be built in O(n log n) time.
Proof. The data structure is basically identical to the H-RCP data structure given in Section 5. Let σ 1 , . . . , σ n be the interior-disjoint segments in G sorted in increasing order of their lengths. Suppose W i is the wedge dual to σ i . We successively overlay the wedges W 1 , . . . , W n to create a subdivision Γ of the dual space, as what we do in Section 5 for the candidate pairs. A point-location data structure on Γ is then our H-RSS data structure for G. Note that Lemma 16 can be applied to show |Γ | = O(n), because when proving Lemma 16 we only used the facts that the candidate pairs do not cross each other and the wedges are inserted in increasing order of the lengths of their corresponding candidate pairs (here the segments σ 1 , . . . , σ n are also non-crossing and sorted in increasing order of their lengths). As such, the space cost of the data structure is O(n) and the query time is O(log n). In Section 5.1, we show that if the candidate pairs are already given, our H-RCP data structure can be built in O(n log n) time. It follows that our H-RSS data structure can be built in O(n log n) time, as we are directly given the segments in this case.
[Upper bound] First, we prove the upper bound for E[κ p (A)]. To this end, we need to study the distribution of the random variable κ(A). For convenience, we assume m is even and sufficiently large. We make the following claims. Area(D j ∩ R).

Now assume we have a lower bound µ for all
Note that E δ,i happens only if a i / ∈ U ∩ R. Therefore, If δ ≥ 2∆, then Area(D j ∩ R) ≥ δ∆/18 for any j (as argued at the beginning of the proof), so we can set µ = δ∆/18. The above inequality directly implies the claim (1). If δ ∈ (0, 2∆], then Area(D j ∩ R) ≥ δ 2 /36 for any j (as argued at the beginning of the proof), so we can set µ = δ 2 /36. The above inequality directly implies the claim (2). With the two claims in hand, we now prove the lemma. We shall use the formula Set q = 1/p. For t ≥ α p , we have t q ≥ α > 2∆ /m 2 ≥ 2∆. Therefore, by applying the claim (1) above we have It follows that The integration Again, let α = 72∆ /m 2 . By applying the claim (1) above, we have The rightmost equality above follows from Equation 2. If β ≤ 2∆, then It suffices to show . By the claim (2) above, Since the integration . . , m} such that i = j. Set δ 1 = ∆ /(2m 2 ). We observe that Pr[dist(a i , a j ) ≤ δ 1 ] ≤ 1/m 2 . Indeed, if D is the disc centered at a i with radius δ 1 , then we always have Area(D ∩ R) ≤ 2δ 1 ∆ = ∆∆ /m 2 (as argued at the beginning of the proof), and hence By union bound, we have We again observe that Pr[dist(a i , a j ) ≤ δ 2 ] ≤ 1/m 2 for any distinct i, j ∈ {1, . . . , m}. Indeed, if D is the disc centered at a i with radius δ 2 , then we always have Area(D ∩ R) ≤ 4δ 2 2 = ∆∆ /m 2 (as argued at the beginning of the proof), and hence Applying the same argument as above, we can deduce . This proves the lower bound for E[κ p (A)]. The above proof straightforwardly applies to the special case in which R is a segment. Pr[E i,j ].
Note that under the condition Cx ,ỹ,J , Q is just (−∞,x] × (−∞,ỹ]. Thus the condition Cx ,ỹ,J is equivalent to saying that the maximum of the x-coordinates (resp., y-coordinates) of a 1 , a 2 isx (resp.,ỹ), all a j for j ∈ J are contained in the rectangle R = [0,x] × [0,ỹ], and all a j for j ∈ J for j ∈ {3, . . . , n}\J are in R\R . As such, one can easily verify that, under the condition Cx ,ỹ,J , the distribution of the random number δ x (resp., δ y ) is the uniform distribution on the interval [0,x] (resp., [0,ỹ]), and the distributions of the m random points in S J are the uniform distribution on R ; furthermore, these random numbers/points are independent of each other. This says, if we consider a new random experiment in which we independently generate two random numbers δ x , δ y from the uniform distributions on [0,x], [0,ỹ] respectively (which correspond to δ x , δ y ) and a random dataset S ∝ (R ) m (which corresponds to S J ), then we have So it suffices to bound Pr[(δ x ≤ κ(S ))∧(δ y ≤ κ(S ))] in the new experiment; we denote by λ this probability. We apply the formula where p(·) is the probability distribution function of κ(S ). Since δ x (resp., δ y ) is drawn uniformly on the interval [0,x] (resp., [0,ỹ]), we have Pr[δ x ≤ t] = min{t/x, 1} (resp., Pr[δ y ≤ t] = min{t/ỹ, 1}). Without loss of generality, we assumex ≤ỹ. Then we have Noting the fact that [ Step 2] In order to apply the result achieved in Step 1 to bound Pr[E 1,2 ], we need to bound Pr[|Λ| = m] for m = {0, . . . , n−2}. This is a purely combinatorial problem, because the random variable |Λ| only depends on the orderings of the x-coordinates and y-coordinates of a 1 , . . . , a n . The ordering of the x-coordinates (xordering for short) of a 1 , . . . , a n can be represented as a permutation of {a 1 , . . . , a n }; so is the y-ordering. Thus, in terms of the ordering of coordinates, there are (n!) 2 different configurations of S, each of which can be represented by a pair (π, π ) of permutations of {a 1 , . . . , a n } where π (resp., π ) represents the x-ordering (resp., y-ordering) of a 1 , . . . , a n ; that is, if π = (a i0 , . . . , a in ) and π = (a i 0 , . . . , a i n ), then a i0 .x < · · · < a in .x and a i 0 .y < · · · < a i n .y (we can ignore the degenerate case in which two random points have the same xcoordinates or y-coordinates, because the random points in S have distinct coordinates with probability 1). Note that every configuration occurs with the same probability 1/(n!) 2 . If S has the configuration (π, π ), then Λ is just the subset of {3, . . . , n} consisting of all i such that rk π (a i ) ≤ max{rk π (a 1 ), rk π (a 2 )} and rk π (a i ) ≤ max{rk π (a 1 ), rk π (a 2 )}, where the function rk computes the rank of an element in a permutation (i.e., the position of the element in the permutation). Therefore, we can pass to a new random experiment in which we generate independently and uniformly the two permutations π, π of {a 1 , . . . , a n } (i.e., uniformly generate a configuration of S), and study Pr We first compute Pr[max{r 1 , r 2 } = i]. By an easy counting argument, we see that, among the n! permutations of {a 1 , . . . , a n }, there are exactly 2(i − 1)(n − 2)! permutations in which the maximum of the ranks of a 1 and a 2 is i. Therefore, We then consider Pr[|Λ| = m | max{r 1 , r 2 } = i]. If i < m + 2, the probability is 0, because |Λ ∪ {1, 2}| ≤ max{r 1 , r 2 } by definition. So assume i ≥ m + 2. Suppose the permutation π has already been generated and satisfies max{r 1 , r 2 } = i. Let A = {a j : r j ≤ i}. Note that |A| = i and a 1 , a 2 ∈ A. Now we randomly generate the permutation π and observe the probability of |Λ| = m. Clearly, |Λ| = m iff max{rk π | A (a 1 ), rk π | A (a 2 )} = m + 2, where π | A is the permutation of A induced by π (i.e., the permutation obtained by removing the points in S\A from π ). Using the same counting argument as above, we have

A.3 Proof of Lemma 4
It suffices to consider the case in which I 1 , . . . , I n are aligned vertical segments and X = U ↓ . Without loss of generality, we may assume I i = x i × [0, 1] where x 1 < · · · < x n are real numbers. Fix i, j ∈ {1, . . . , n} such that i < j. We first define some random variables. Let y k = a k .y for all k ∈ {1, . . . , n}. Define y max = max{y i , y j } and y min = min{y i , y j }. The 3-sided rectangle U = [x i , x j ] × (−∞, y max ] is the minimal bottom-unbounded rectangle containing both a i and a j , and clearly (a i , a j ) ∈ Φ(S, U ↓ ) iff (a i , a j ) is the closest-pair in S ∩ U . Define Λ = {k : i < k < j and a k ∈ U }, which is a random subset of {i + 1, . . . , j − 1}. We bound Pr[(a i , a j ) ∈ Φ(S, U ↓ )] through three steps. [ Step 1] Let us first fix the values of y max and Λ, and consider the corresponding conditional probability of the event (a i , a j ) ∈ Φ(S, U ↓ ). Formally, we claim that, for allỹ ∈ (0, 1] and all nonempty K ⊆ {i+1, . . . , j−1}, For convenience, we use Cỹ ,K to denote the condition in the above conditional probability. Assume |K| = m. Let δ = y max − y min and Y K = {y k : k ∈ K}. We first notice that, under the condition Cỹ ,K , Thus the condition Cỹ ,K is equivalent to saying that the maximum of y i , y j isỹ, all y k for k ∈ K are in [0,ỹ], and all y k for k ∈ {i + 1, . . . , j − 1}\K are in (ỹ, 1]. As such, one can easily verify that, under the condition Cỹ ,K , the distribution of δ is the uniform distribution on [0,ỹ], and the distributions of the m random numbers in Y K are also the uniform distribution on [0,ỹ]; furthermore, these random numbers are independent of each other. This says, if we consider a new random experiment in which we independently generate a random number δ from the uniform distribution on [0,ỹ] (which corresponds to δ) and a random dataset Y ∝ [0,ỹ] m (which corresponds to Y K ), then we have So it suffices to bound Pr[δ ≤ κ(Y )] in the new experiment. We apply the formula where p(·) is the probability distribution function of κ(Y ). Since δ is drawn from the uniform distribution on [0,ỹ], Pr[δ ≤ t] = t/ỹ for t ∈ [0,ỹ]. Thus, [ Step 2] In order to apply the result achieved in Step 1 to bound the unconditional probability of (a i , a j ) ∈ Φ(S, U ↓ ), we need to bound Pr[|Λ| = m] for all m ∈ {0, . . . , j − i − 1}. This is a combinatorial problem, because the random variable |Λ| only depends on the ordering of y i , . . . , y j . There are (j − i + 1)! possible orderings, each of which can be represented by a permutation of {y i , . . . , y j }. Every ordering occurs with the same probability 1/(j − i + 1)!. For a permutation π of {y i , . . . , y j }, we write λ π = max{rk π (y i ), rk π (y j )}, where the function rk computes the rank of an element in a permutation. Clearly, if the ordering is π, then |Λ| = λ π − 2. As such, we can pass to a new random experiment in which we generate uniformly a permutation π of {y i , . . . , y j } and study Pr[|Λ| = m] for m ∈ {0, . . . , j − i − 1} in this new experiment. Fixing m ∈ {0, . . . , j − i − 1}, it follows that |Λ| = m iff λ π = m + 2. By an easy counting argument, we see that, among the (j − i + 1)! permutations of {y i , . . . , y j }, there are exactly 2(m + 1)(j − i − 1)! permutations in which the maximum of the ranks of y i and y j is m + 2. Therefore, [ Step 3] Using the results achieved in the previous steps, the lemma can be readily proved. We apply the formula We

A.4 Proof of Lemma 10
It suffices to consider the case in which l is a vertical line. Then P l ⊆ P v , so a (P v , k)-TBEP data structure is naturally a (P l , k)-TBEP data structure. In a dataset S ⊆ R 2 , we say a point a ∈ S is a candidate point if a is one of the k topmost/bottommost points in S ∩ P for some P ∈ P l . We denote by Ψ (S) the subset of S consisting of all candidate points in S. Note that for any P ∈ P l , the k topmost/bottommost points in S ∩ P are just the k topmost/bottommost points in Ψ (S) ∩ P . As such, answering a (P l , k)-TBEP query on S is equivalent to answering a (P l , k)-TBEP query on Ψ (S). Therefore, we can define our (P l , k)-TBEP data structure K l as K l (S) = K v (Ψ (S)), where K v is the (P v , k)-TBEP data structure defined in Lemma 9. Now let S ∝ n i=1 I i where I 1 , . . . , I n are distinct aligned vertical segments. Assume I 1 , . . . , I n are sorted from left to right, in which I 1 , . . . , I t are to the left of l and I t , . . . , I n are to the right of l. Denote by a i ∈ S the random point drawn on I i . We claim that E[|Ψ (S)|] = O(log n). Let i ∈ {1, . . . , t}. If a i ∈ Ψ (S), then it must be one of the k topmost/bottommost points in S ∩ P for some P ∈ P l . Note that any P ∈ P l with a i ∈ P contains a i , . . . , a t . Therefore, if a i ∈ Ψ (S), it must be one of the k topmost/bottommost points among a i , . . . , a t . Since the random points are generated independently, the probability that a i is one of the k topmost/bottommost points among a i , . . . , a t is exactly min{2k/(t − i + 1), 1}. As such, Pr[a i ∈ Ψ (S)] ≤ 2k/(t − i + 1). Using the same argument, we see that for Since k is a constant, we have E[|Ψ (S)|] = O(log n). Thus, for a positive random variable x, hence we have E[Qtime(K l (S))] = O(log log n). The case in which l is a horizontal line is handled symmetrically.

A.5 Proof of Lemma 13
It suffices to consider the case in which I 1 , . . . , I n are aligned vertical segments and X = U ↓ . Without loss of generality, assume I i = x i × [0, 1] where x 1 < · · · < x n are real numbers. We denote by a i ∈ S the random point drawn on I i . Also, suppose a 1 , . . . , a t are to the left of l, while a t+1 , . . . , a n are to the right of l. Let E i,j be the event that (a i , a j ) ∈ Φ(A, U ↓ ). Then we have the equation Pr[E i,j ].
By applying Lemma 4 and the fact i.e., the Cartesian product of two copies of Φ l (S, U ↓ ). Then |Ψ | = |Φ l (S, U ↓ )| 2 . So it suffices to bound E[|Ψ |]. Clearly, for i, i ∈ {1, . . . , t} and j, j ∈ {t + 1, . . . , n}, ((a i , a j ), (a i , a j )) ∈ Ψ iff E i,j ∧ E i ,j . Therefore, we have  [Case 2] We then consider the case in which i = i and j = j . Let δ = j − i and δ = j − i . In this case, we claim that Pr[E i,j ∧ E i ,j ] = O((log δ log δ )/(δδ ) 2 ). To prove this, we may assume that δ is sufficiently large, say δ ≥ 5. Indeed, when δ < 5, what we want is Pr[E i,j ∧ E i ,j ] = O(log δ/δ 2 ), which is true as . For the same reason, we may also assume δ ≥ 5. Let S 0 (resp., S 1 ) be the subsets of S consisting of a i , a j (resp., a i , a j ) and all the random points in S\{a i , a j , a i , a j } with even indices (resp., odd indices). Clearly, S = S 0 ∪ S 1 and S 0 ∩ S 1 = ∅. Define F 0 (resp., F 1 ) as the event (a i , a j ) ∈ Φ(S 0 , U ↓ ) (resp., (a i , a j ) ∈ Φ(S 1 , U ↓ )). Since S 0 and S 1 are subsets of S, E i,j (resp., E i ,j ) happens only if F 0 (resp., F 1 ) happens. Besides, F 0 and F 1 are independent events, because S 0 ∩ S 1 = ∅. Thus, [Case 3] The subtlest case is that i = i and j = j , or symmetrically i = i and j = j . Assume i = i and j > j . Let δ = j − i and δ = j − i. We claim that Pr[E i,j ∧ E i,j ] = O(log δ/(δ 2 δ )). Again, we may assume δ and δ are sufficiently large, say δ > δ ≥ 5. Let S 0 (resp., S 1 ) be the subsets of S consisting of a i , a j (resp., a i , a j ) and all the random points in S\{a i , a j , a j } with even indices (resp., odd indices). Clearly, S = S 0 ∪ S 1 and S 0 ∩ S 1 = {a i }. Define F 0 (resp., F 1 ) as the event (a i , a j ) ∈ Φ(S 0 , U ↓ ) (resp., (a i , a j ) ∈ Φ(S 1 , U ↓ )). As in Case 2, we have in general, because F 0 and F 1 are not independent (both of them depends on a i .y). To handle this issue, we observe that since the distribution of a i .y is the uniform distribution on [0, 1]. Note that under the condition a i .y = t, F 0 and F 1 are in fact independent. Indeed, when a i .y is fixed, F 0 (resp., F 1 ) only depends on the y-coordinates of the random points in S 0 \{a i } (resp., S 1 \{a i }). Therefore, we can write We first consider Pr[F 1 | a i .y = t] for a fixed t ∈ [0, 1]. Let S 1 = S 1 ∩ {a i , . . . , a j }, i.e., S 1 is the subset of S 1 consisting of all the points whose x-coordinates are in [x i , x j ]. We notice that F 1 happens only if a j is y-adjacent to a i in S 1 , i.e., there is no other point whose y-coordinate is in between a i .y and a j .y. Indeed, if there exists a ∈ S 1 \{a i , a j } such that a.y is in between a i .y and a j .y, then dist(a i , a) < dist(a i , a j ) and a is in the minimal bottom-unbounded 3-sided rectangle containing a i , a j , which implies F 1 does not happen. We claim that, under the condition a i .y = t, the probability that a j is y-adjacent to a i in S 1 is O(1/δ ). The y-coordinates of the random points in S 1 \{a i } are independently drawn from the uniform distribution on [0, 1], so every point in S 1 \{a i } has the same probability (say p) to be y-adjacent to a i . Let r be the number of the points in S 1 \{a i } that are y-adjacent to a i , which is a random variable. Then E[r] = p · |S 1 \{a i }|. But we always have r ≤ 2, since there can be at most two points y-adjacent to a i . In particular, E[r] ≤ 2 and p = O(1/|S 1 \{a i }|). By construction, we have |S 1 \{a i }| = Θ(δ ) (recall the assumption δ ≥ 5). It follows that p = O(1/δ ), i.e., the probability that a j is y-adjacent to a i in S 1 is O(1/δ ). Using our previous argument, we have Pr[F 1 | a i .y = t] = O(1/δ ). Therefore, Note that 1 0 Pr[F 0 | a i .y = t] dt = Pr[F 0 ]. By construction, there are Θ(δ) points in S 0 whose x-coordinates are in between a i .x and a j .x (recall the assumption δ ≥ 5). Thus, Lemma 4 implies Pr[F 0 ] = O(log δ/δ 2 ). Plugging in this to the equation above, we have Pr[F 0 ∧ F 1 ] = O(log δ/(δ 2 δ )). As a result, Pr[E i,j ∧ E i,j ] = O(log δ/(δ 2 δ )). The sum of the terms Pr[E i,j ∧ E i ,j ] satisfying i = i and j > j is O(log 3 n), as one can easily verify. For the same reason, the terms satisfying i = i and j < j also sum up to O(log 3 n). The symmetric case that i = i and j = j is handled in the same fashion.
Combining all the cases, we conclude that

A.6 Proof of Theorem 14
The R-RCP data structure D 2 is described in Section 4.3.
[Query time] We first analyze the (worst-case) query time. When answering a query, we first find the splitting nodes u and v in the 2D range tree. As argued in the proof of Theorem 11, this can be done in O(log n) time. Then we query the sub-structures stored at v to compute φ, φ α , φ β . Note that all the sub-structures have O(log n) query time and we only need constant number of queries. Therefore, this step takes O(log n) time, and hence the overall query time is also O(log n).
[Average-case space cost] We now analyze the average-case space cost of D 2 . Let R be an axes-parallel rectangle and S ∝ R n . We denote by a 1 , . . . , a n the n random points in S. The data structure instance D 2 (S) is essentially a 2D range tree built on S with some sub-structures stored at secondary nodes. Note that a 2D range tree built on a set of n points in R 2 has a fixed tree structure independent of the locations of the points. This says, while D 2 (S) is a random data structure instance depending on the random dataset S, the 2D range tree in D 2 (S) has a deterministic structure. As such, we can view D 2 (S) as a fixed 2D range tree with random sub-structures. Let T denote the primary tree of this 2D range tree and T u denote the secondary tree at the node u ∈ T , as in Section 4. To bound E[Space(D 2 (S))], it suffices to bound the average-case space cost of the sub-structures stored at each secondary node.
For convenience of exposition, we introduce some notations. Let u ∈ T be a primary node. Suppose the n leaves of T are lf 1 , . . . , lf n sorted from left to right. Then the leaves in the subtree rooted at u must be lf α , . . . , lf β for some α, β ∈ {1, . . . , n} with α ≤ β We then write range(u) = [α : β] and size(u) = β − α + 1. Due to the construction of a 2D range tree, we always have |S(u)| = size(u) no matter what the random dataset S is. Furthermore, if range(u) = [α : β], then S(u) contains exactly the points in S with x-ranks α, . . . , β (we say a point has x-rank i in S if it is the i-th leftmost point in S). Let v ∈ T u be a secondary node. We can define range(v) and size(v) in the same way as above (just by replacing T with T u ). Also, we always have |S(v)| = size(v). If range(v) = [α : β], then S(v) contains exactly the points in S(u) with y-ranks α, . . . , β (we say a point has y-rank i in S(u) if it is the i-th bottommost point in S(u)). In what follows, we fix a secondary node v ∈ T u and analyze the sub-structures stored at v. Let u (resp., v ) denote the left child of u (resp., v). Suppose range(u) = [α : β], range(u ) = [α : β ] (where β < β), range(v) = [γ : ξ], range(v ) = [γ : ξ ] (where ξ < ξ).
We want to use Theorem 3, Lemma 10, Lemma 13 to bound the average-case space cost of the Q-RCP, TBEP/LREP, U-RSS sub-structures, respectively. However, before applying these results, there is a crucial issue to be handled. Recall that in Theorem 3, Lemma 10, Lemma 13, we assume the random dataset is generated either from the uniform distribution on a rectangle (S ∝ R n ) or from the uniform distributions on a set of aligned segments (S ∝ n i=1 I i ). Unfortunately, here the underlying datasets of the sub-structures are S 1 (v), . . . , S 4 (v) and S (v), S (v), S (v), S (v); these random point-sets are neither (independently and uniformly) generated from a rectangle nor generated from aligned segments. For instance, we cannot directly use Theorem 3 to deduce E[Space(A(S 1 (v)))] = O(log 2 |S 1 (v)|), since S 1 (v) is not uniformly generated from a rectangle, and even its size |S 1 (v)| is not a fixed number (|S 1 (v)| varies with S). The main focus of the rest of this proof is to handle this issue.
We first consider S 1 (v). Note that S 1 (v) = S(u )∩S(v ) by definition. We want to bound E[Space(A(S 1 (v)))]. Our basic idea is the following: reducing this expectation to conditional expectations in which S 1 (v) can be viewed as uniformly and independently generated from an axes-parallel rectangle so that Theorem 3 applies. Since the points in {a * j : j ∈ J} belong to S(u ) (for S * makes Λ = J), the α − 1 leftmost points in F ∪ {a * j : j ∈ J} (which correspond to the points in S to the left of S(u )) must be contained in F , and hence they are just the α − 1 leftmost points in F (which we denote by F 1 ). This implies a * j .
x ≥ x 1 for all j ∈ J. Similarly, the n − β rightmost points in F ∪ {a * j : j ∈ J} (which correspond to the points in S to the right of S(u )) must be the n − β rightmost points in F (which we denote by F 2 ). This implies a * j .
x ≤ x 2 for all j ∈ J. Clearly, the points corresponding to S(u) are exactly those in F ∪ {a * j : j ∈ J}. Since the points in {a * j : j ∈ J} belong to S(v ) (for S * makes Λ = J), the γ − 1 bottommost points in F ∪ {a * j : j ∈ J} (which correspond to the points in S(u) below S(v )) must be contained in F , and hence they are just the γ − 1 bottommost points in F (which we denote by F 1 ). This implies a * j .y ≥ y 1 for all j ∈ J. Similarly, the size(u) − ξ topmost points in F ∪ {a * j : j ∈ J} (which correspond to the points in S(u) above S(v )) must be the size(u) − ξ topmost points in F (which we denote by F 2 ). This implies a * j .y ≤ y 2 for all j ∈ J. Now we already see a * j ∈ R for all j ∈ J. It follows that E J,f happens only if a i = f (i) for all i ∈ [n]\J and a j ∈ R for all j ∈ J. Furthermore, we note that F 1 ∪ F 2 ∪ F 1 ∪ F 2 corresponds to S\S 1 (v). Since S * makes Λ = J, we must have F = F 1 ∪ F 2 ∪ F 1 ∪ F 2 (this argument relies on the existence of such an instance S * making E J,f happen, i.e., it may fail if (J, f ) is not a legal configuration). We then use this fact to show the "if" part. Let S * : {a i = a * i } i∈[n] be an instance of S satisfying a * i = f (i) for all i ∈ [n]\J and a * j ∈ R for all j ∈ J. Then {a * 1 , . . . , a * n } = F ∪ {a * j : j ∈ J}. We look at the subsets F 1 , F 2 , F 1 , F 2 of F . Since a * j .x ∈ [x 1 , x 2 ] for all j ∈ J, F 1 (resp., F 2 ) contains exactly the α − 1 leftmost points (resp., n − β rightmost points) in F ∪ {a * j : j ∈ J}, which correspond to the points to the left (resp., right) of S(u ). Similarly, since a * j .y ∈ [y 1 , y 2 ] for all j ∈ J, F 1 (resp., F 2 ) contains exactly the γ − 1 bottommost points (resp., size(u) − ξ topmost points) in F ∪ {a * j : j ∈ J}, which correspond to the points in S(u) below (resp., above) S(v). Then F = F 1 ∪ F 2 ∪ F 1 ∪ F 2 corresponds to S\S 1 (v). The remaining points, which correspond to S 1 (v), are exactly those in {a * j : j ∈ J}. Therefore, Λ = J and S * makes E J,f happen. Now we see that E J,f happens iff a i = f (i) for all i ∈ [n]\J and a j ∈ R for all j ∈ J, i.e., As such, under the condition E J,f , the random points in S J = {a j : j ∈ J} can be viewed as independently drawn from the uniform distribution on R . Applying Theorem 3, we have Noting that |J| ≤ size(v ) ≤ size(v) if (J, f ) is a legal configuration, we can deduce where E = {E J,f : (J, f ) is a legal configuration}. Using this result, we further show that E[Space(A(S 1 (v)))] = O(log 2 size(v)). Clearly, E is a collection of mutually disjoint (or mutually exclusive) events. Furthermore, we notice that whenever a 1 , . . . , a n have distinct x-coordinates and y-coordinates, some E J,f ∈ E happens. That says, E is a collection of almost collectively exhaustive events in the sense that with probability 1 some E J,f ∈ E happens. Since the events in E are mutually disjoint and almost collectively exhaustive, E[Space(A(S 1 (v)))] = O(log 2 size(v)) follows directly from the law of total expectation and Equation 8. Clearly, the same idea applies to bound E[Space(A(S i (v)))] for all i ∈ {1, . . . , 4}. Next, we consider S (v). We want to bound E[Space(K lu (S (v)))] and E[Space(C(Φ (v)))] where Φ (v) = Φ lu (S (v), U ↓ ) by definition. The idea is totally the same as in the last paragraph: reducing to conditional expectations in which S (v) can be viewed as independently generated from a set of aligned (vertical) segments so that Lemma 10 and Lemma 13 apply. We change the definition of Λ in the last paragraph to Λ = {i : a i ∈ S (v)}, and again define Using the same argument as in the last paragraph, one can easily verify that E J,f happens iff a i = f (i) for all i ∈ [n]\J and a j ∈ R for all j ∈ J. For an injective function g : J → (x 1 , x 2 ), we further define Now E J,f,g happens iff a i = f (i) for all i ∈ [n]\J and a j ∈ {g(j)} × [y 1 , y 2 ] for all j ∈ J. Thus, under E J,f,g , the |J| random points in S J = {a j : j ∈ J} can be viewed as independently drawn from the |J| aligned vertical segments in {{g(j)} × [y 1 , y 2 ] : j ∈ J}. To apply Lemma 10 and Lemma 13, we still need to consider one thing: the line l u . The line l u is a random vertical line depending on S. However, we notice that under E J,f,g , l u is fixed. Indeed, under E J,f,g , S(u) corresponds to F ∪ {a j : j ∈ J}. Thus, the x-coordinates of the points in S(u) are fixed under E J,f,g , and hence l u is fixed. As such, we are able to apply Lemma 10 to deduce and apply Lemma 13 to deduce Note that, if E J,f happens, then with probability 1 some E J,f,g happens. Therefore, the collection E = {E J,f,g }, which consists of all E J,f,g where (J, f ) is a legal configuration and g : J → (x 1 , x 2 ) is an injective function with range (x 1 , x 2 ) depending on (J, f ), is a collection of mutually disjoint and almost collectively exhaustive events. By the law of total expectation, we immediately have E[Space(K lu (S (v))))] = O(log size(v)) and E[Space(C(Φ (v)))] = O(log 2 size(v)). The expected space cost of the sub-structures built on S (v) can be bounded using the same argument. Also, one can handle S (v) and S (v) in a similar way. The only difference is that, in the event E J,f,g , the g function should indicate the y-coordinates of the points in {a j : j ∈ J} instead of the x-coordinates.
Once we know that the expected space cost of all the substructures stored at v is poly-logarithmic in size(v), we can deduce that the expected space cost of each secondary tree T u (with the sub-structures) is O(size(u)). As a result, E[Space(D 2 (S))] = O(n log n). Suppose the n random points in S are a 1 , . . . , a n . Let E i,j be the event that (a i , a j ) ∈ Φ(S, H ), and observe Note that all Pr[E i,j ] in the above equation are the same, which implies E[|Φ(S, H )|] = O(n 2 ·Pr[E 1,2 ]). Thus, it suffices to bound Pr[E 1,2 ]. As in the proof of Lemma 2, we define random variables x max = max{a 1 .x, a 2 .x}, y max = max{a 1 .y, a 2 .y}, x min = min{a 1 .x, a 2 .x}, y min = min{a 1 .y, a 2 .y}, Q = (−∞, x max ] × (−∞, y max ], and Λ = {i ≥ 3 : a i ∈ Q}. We also define Q = (−∞, x max /2] × (−∞, y max /2] and Λ = {i ≥ 3 : a i ∈ Q }. We achieve the bound for Pr[E 1,2 ] through four steps. [ Step 1] We begin with establishing the following key observation: for any H ∈ H , a 1 , a 2 ∈ H implies Q ⊆ H. To see this, let H ∈ H and assume a 1 , a 2 ∈ H. If {a 1 , a 2 } = {(x min , y min ), (x max , y max )}, then H contains the point (x max , y max ). This implies that H contains the point (x max /2, y max /2) and hence contains Q , because H = l ↓ for a line l of non-positive slope. If {a 1 , a 2 } = {(x min , y max ), (x max , y min )}, then H contains the 5-polygon P whose vertices are (0, 0), (x max , 0), (x max , y min ), (x min , y max ), (0, y max ). Note that P contains the point (x max /2, y max /2), which implies that H also contains the point (x max /2, y max /2) and hence contains Q . [ Step 2] Based on the observation in Step 1, we prove a result which is similar to Equation 3 in the proof of Lemma 2. We claim that for allx ∈ (0, 1], allỹ ∈ (0, ∆], and all nonempty J ⊆ {3, . . . , n}, The argument for proving this is similar to that for proving Equation 3. We use C x,ỹ,J to denote the condition in the above conditional probability. Assume |J | = k. Let δ x = x max − x min and δ y = y max − y min . Since any halfplane H ∈ H containing a 1 , a 2 must contain Q , E 1,2 happens only if δ x ≤ κ(S J ) and δ y ≤ κ(S J ), where S J = {a j : j ∈ J }. So it suffices to bound Pr[(δ x ≤ κ(S J )) ∧ (δ y ≤ κ(S J )) | C x,ỹ,J ]. Under the condition C x,ỹ,J , Q is just (−∞,x/2] × (−∞,ỹ/2]. Thus the condition C x,ỹ,J is equivalent to saying that the maximum of the x-coordinates (resp., y-coordinates) of a 1 , a 2 isx (resp.,ỹ), all a j for j ∈ J are contained in the rectangle R = [0,x/2] × [0,ỹ/2], and all a j for j ∈ {3, . . . , n}\J are contained in R\R . As such, one can easily verify that, under the condition C x,ỹ,J , the distribution of the random number δ x (resp., δ y ) is the uniform distribution on the interval [0,x] (resp., [0,ỹ]) and the distributions of the k random points in S J are the uniform distribution on R ; furthermore, these random numbers/points are independent of each other. This says, if we consider a new random experiment in which we independently generate two random numbers δ x , δ y from the uniform distributions on [0,x], [0,ỹ] respectively (which correspond to δ x , δ y ) and a random dataset S ∝ (R ) k (which corresponds to S J ), then we have Pr[(δ x ≤ κ(S )) ∧ (δ y ≤ κ(S ))] = Pr[(δ x ≤ κ(S J )) ∧ (δ y ≤ κ(S J )) | C x,ỹ,J ].

B Implementation details of the preprocessing algorithm
We now discuss the implementation details in the preprocessing algorithm. Let T be the BST currently storing ∂F i−1 .
The first thing we need to show is, given two points u, v ∈ ∂F i−1 , how to report in the left-right order the fractions of ∂F i−1 intersecting σ, where σ is the portion of ∂F i−1 between u, v. Clearly, we are reporting a set of consecutive fractions of ∂F i−1 . We can find in T the node u corresponding to the leftmost fraction to be reported in O(log |T |) time. We then report this fraction. After this, we simply apply a (in-order) traversal from u to report the other fractions in the left-right order. Since the fractions to be reported are consecutive, it is easy to see that the time cost is O(log |T | + k), where k is the number of the reported fractions.
The second thing we need to show is how to update T . At the beginning of the i-th iteration, T stores ∂F i−1 , and we need to update it to ∂F i . Clearly, ∂F i is obtained by using the connected components of ∂W i ∩ F i−1 to replace the corresponding portions of ∂F i−1 . We consider the components of ∂W i ∩ F i−1 one-by-one (there are constant number of components to be considered by the proof of Lemma 16). Let ξ be a component, which must be an x-monotone polygonal chain consisting of at most two pieces. For convenience, assume ξ has a left endpoint u and a right end point v. It is clear that u, v ∈ ∂F i−1 . We need to replace the portion σ of ∂F i−1 between u, v with ξ; we call this a Replace operation. To achieve this, we first report the fractions of ∂F i−1 intersecting σ, by using the approach described above. Suppose the reported fractions are γ 1 , . . . , γ k sorted in the left-right order. Then u ∈ γ 1 and v ∈ γ k . Clearly, the fractions γ 2 , . . . , γ k−1 should be removed, as they disappear after replacing σ with ξ. This can be done by deleting the corresponding nodes from T via k − 2 BST-deletion operations. Also, we need to modify γ 1 and γ k : the portion of γ 1 (resp., γ k ) to the right (resp., left) of u (resp., v) should be "truncated". This can be done by directly updating the information stored in the two corresponding nodes. Finally, ξ should be inserted. Each piece of ξ becomes a new fraction, for which we create a new node storing the information of the fraction and insert it into T via a BST-insertion operation. Now we analyze the time cost of this Replace operation. Let |T | be the size of T before the operation. The time cost for reporting is O(log |T | + k). Removing γ 2 , . . . , γ k−1 takes O(k log |T |) time. Modifying γ 1 , γ k and Inserting ξ takes O(log |T |) time (note that ξ has at most two pieces). So the total time of this Replace operation is O(k log |T |). If k ≤ 2, then the time cost is just O(log |T |). If k > 2, we observe that there are Ω(k) nodes deleted from T in this Replace operation. Note that the total number of the nodes deleted from T cannot exceed the total number of the nodes inserted. Over the m iterations, we have in total O(m) Replace operations, each of which inserts O(1) nodes into T . Therefore, one can delete at most O(m) nodes from T in total. It follows that the total time cost for all Replace operations is O(m log m), which is also the total time cost for updating T . In other words, T can be updated in amortized O(log m) time for each iteration.