A Deterministic Partition Tree and Applications
Abstract
In this paper, we present a deterministic variant of Chan’s randomized partition tree [Discret. Comput. Geom., 2012]. This result leads to numerous applications. In particular, for -dimensional simplex range counting (for any constant ), we construct a data structure using space and preprocessing time, such that each query can be answered in time (specifically, time), thereby breaking an lower bound known for the semigroup setting. Notably, our approach does not rely on any bit-packing techniques. We also obtain deterministic improvements for several other classical problems, including simplex range stabbing counting and reporting, segment intersection detection, counting and reporting, ray-shooting among segments, and more. Similar to Chan’s original randomized partition tree, we expect that additional applications will emerge in the future, especially in situations where deterministic results are preferred.
Keywords and phrases:
partition trees, simplex range searching, segment intersection queries, ray-shootings, multi-level data structures2012 ACM Subject Classification:
Theory of computation Computational geometry ; Theory of computation Design and analysis of algorithmsAcknowledgements:
We would like to thank Timothy Chan for the discussions on derandomizing his partition tree.Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Simplex range searching is a fundamental problem in computational geometry. Given a set of points in the -dimensional space for a constant , the goal is to build a data structure so that points of inside a query simplex can be found efficiently. The problem has been extensively studied (see [1, 2, 19] for some excellent surveys). For solving the problem with small space (e.g., near linear), one powerful technique is partition trees, e.g., [7, 14, 16, 17, 18, 23, 24, 25]. In particular, using a partition tree Matoušek [17] built a data structure of space in time and each simplex range query can be answered in time. Subsequently Matoušek [18] gave another more complicated data structure of space with preprocessing time and query time; throughout the paper let be an arbitrarily small positive constant. Chazelle [10] proved that (and for ) is a lower bound on the query time for an -space data structure; it is widely believed that the factor is an artifact of the proof. Although the result of [18] seems to achieve optimal query time with linear space, it is not quite satisfactory. One reason is that partition trees are often used as backbone for designing multi-level data structures and certain properties of the result of [18] makes this challenging, e.g., it has a special root of degree whose children may have overlapping cells and it does not guarantee the optimal crossing number except at the bottom most level of the tree. To address these issues, Chan [7] proposed a randomized partition tree of space that can be built in expected time and the query time is bounded by w.h.p. Comparing to Matoušek’s work [18], Chan’s result has many nice properties, e.g., each node has children, crossing number is optimal (with w.h.p) at almost all layers (except the top few layers), and children’s cells at each node are pairwise disjoint. These properties make Chan’s partition tree quite amenable to multi-level data structures [7, 8, 9].
Simplex range searching has several versions: (1) Counting: compute the number of points of inside the query simplex ; (2) reporting: report all points of inside ; (3) semigroup query (which generalizes the counting query): compute the sum of the weights of the points of inside , assuming that each point of is assigned a weight from a semigroup. The above results [7, 17, 18] are applicable to semigroup queries and can also be modified to solve range reporting with an additive term in the query time, where is the output size. In particular, Chazelle’s lower bound [10] is for the semigroup setting.
Since Chan’s partition tree is randomized, for those who need deterministic algorithms, Matoušek’s partition trees [17, 18] are still the main resort. In this paper, we “partially” derandomize Chan’s partition tree and obtain a deterministic tool, especially for designing multi-level data structures. More specifically, our partition tree is similar to Chan’s (e.g., space, optimal crossing number at all levels except the top few levels, disjointness of children’s cells of each node); however, the degree of each node is logarithmic instead of constant (that is why we used “partially” above). Albeit this drawback, our tree is still powerful enough to have many applications, as discussed below.
Simplex range counting.
For simplex range counting, we construct a data structure of space that can answer each query in time. Note that this does not violate Chazelle’s lower bound [10] as it is for the more general semigroup queries. The preprocessing time is . To achieve the result, we construct our partition tree so that each leaf has points of for an arbitrarily small constant . Because this number is small, we can afford to preprocess all leaves in time by considering all possible configurations for queries; in this way, each query on a single leaf can be answered in time. While this kind of technique may not be quite surprising, it has never been used in simplex range searching, perhaps because previous work has been focusing on the semigroup queries while this technique does not work in that setting. Note that our above result is also applicable to simplex range emptiness queries.
It should be noted that very recently Chan and Zheng [9] obtained similar randomized results (i.e., words of space and query time w.h.p.) for other data structures and also mentioned a possibility of achieving such result for simplex range counting. One difference is that their technique uses bit-packing by assuming each word has bits (note that the -bit word RAM is also a conventional computational model), while ours does not use bit-packing. In addition, their result is randomized while ours is deterministic.
Simplex range stabbing.
Given a set of simplices in , the simplex range stabbing counting problem is to build a data structure to compute the number of simplices containing a query point. Using Chan’s partition tree, Chan and Zheng [9] built a randomized data structure of space (or words of space using the bit-packing tricks) in expected preprocessing time and the query time is w.h.p. Using our new deterministic partition tree, we build a deterministic data structure of space in preprocessing time and the query time is . Previously, the best deterministic results [17, 18] have preprocessing time, space, and query time; or preprocessing time and space, and query time. For the reporting problem (i.e., report all simplices containing a query point), the randomized data structure of [9] has the same performance as above except that the query time becomes w.h.p., where is the output size. We also obtain the same deterministic result as above with query time.
Segment intersection searching.
Given a set of (possibly intersecting) line segments in the plane, the segment intersection counting problem is to build a data structure to compute the number of segments intersecting a query segment. Chan and Zhen [9] built a randomized data structure of space (which again can be reduced to words of space if bit-packing tricks are allowed) in expected preprocessing time and the query time is w.h.p. Using our new deterministic partition tree, we build a deterministic data structure of space in preprocessing time and the query time is . The previously best deterministic result [6] built a data structure of space in time that can answer each query in time.
For the reporting problem (i.e., report all segments intersecting a query segment), the randomized data structure of [9] has the same performance as above except that the query time becomes w.h.p., where is the output size. We also obtain the same deterministic result as above with query time.
Segment intersection detection.
Given (possibly intersecting) line segments in the plane, we wish to build a data structure to decide whether a query line intersects any segment [12, 22]. The previous deterministic result [22] builds an -space data structure in time, with query time. Using our new partition tree, we build an -space data structure with query time and preprocessing time.
Ray-shooting among non-intersecting segments.
Given non-intersecting line segments in the plane, we wish to build a data structure to find the first segment hit by a query ray [3, 6, 20]. The previous best deterministic result [22] builds an -space data structure in time, with query time. With our partition tree, we build an -space data structure with query time and preprocessing time.
Outline.
After introducing notation in Section 2, we present our partition tree in Section 3. The simplex range counting problem is treated in Section 4. The simplex range stabbing and the segment intersection searching problems are discussed in Section 5. We solve the segment intersection detection and the ray-shooting problems in Sections 6 and 7, respectively. Due to the space limit, many proofs and details are omitted but can be found in the full paper.
2 Preliminaries
Let be a set of hyperplanes in . We use to denote the arrangement of , which can be computed in time [13]. For a compact region , we use to denote the subset of hyperplanes of that intersect the relative interior of but does not contain (we also say that these hyperplanes cross ). A cutting for is a collection of closed cells (each of which is a simplex, possibly unbounded) with disjoint interiors, which together cover the entire space [11, 18]. The size of is the number of cells of . For a parameter with , a -cutting for is a cutting satisfying for every cell .
For any , a -cutting of size for can be computed in time [11]. We further have Lemma 1 (similar results were mentioned before [5, 7, 21]). Throughout the paper, always refers to the one in the lemma.
Lemma 1.
(Cutting Lemma) Let be a set of hyperplanes and a simplex in . For any , we can compute a -cutting of cells for whose union is in time, where is the number of vertices of inside and is an arbitrarily small positive constant.
Proof.
We apply Chazelle’s algorithm [11] but only on the region inside (i.e., starting with following the notation of [11]). A detailed analysis for the 2D case is given in [21, Appendix A]. Below we sketch how to modify the analysis in [11] accordingly by following the notation there.
The analysis follows the same approach except that we use to replace in the formula in [11, Page 153]. As such, the subsequent formula becomes
Then, one can prove by induction that , for an arbitrarily small constant . Therefore, the total number of cells of the cutting inside is as stated in the lemma.
For the time analysis, following the same formula in [11, Page 153] and using the above inequality for , we can derive the time complexity as stated in the lemma.
3 Deterministic partition tree
In this section, we present our deterministic partition tree. The following lemma “partially” derandomizes Chan’s partition refinement theorem (i.e., Theorem 3.1 [7]).
Lemma 2.
Let be a set of points and a set of hyperplanes in . Suppose there are interior-disjoint cells whose union covers , such that each cell contains at most points of and each hyperplane crosses at most cells. Then, for any , we can divide every cell into disjoint subcells each containing at most points of , for a total of at most subcells, so that the total number of subcells crossed by any hyperplane in is bounded by
| (1) |
Remark.
Comparing to Chan’s partition refinement theorem, there is an extra in the second term of (1). As will be seen next, due to this extra factor, we have to make each node of our partition tree have logarithmically many children instead of constant. That is why we said before that we “partially” derandomized Chan’s result.
3.1 Proving Lemma 2
This subsection is devoted to the proof of Lemma 2.
Let denote the set of all given cells in Lemma 2. Let be a multiset containing copies of each hyperplane in , for a parameter that is a sufficiently large power of (so that all future multiplicities are integers; the actual value of is not important). For any multiset of , the size of , denoted by , is the sum of the multiplicities of the hyperplanes in . For a cell , let denote the number of vertices of the arrangement of inside , counting multiplicities (note that the multiplicity of a vertex is the product of the multiplicities of its defining hyperplanes); let denote the (multi)-subset of hyperplanes of crossing .
We process the cells of iteratively one by one. The order they will be processed is carefully chosen (in contrast, a random order is used in the proof of [7], which is a major difference between our proof and that in [7]). We will assign indices to the cells following the reverse order they are processed. Suppose we have processed cells, which have been assigned indices and denoted by . In the next iteration, we will find a cell among the unprocessed cells of and process it (and the cell will be assigned index as ). Suppose we now have a multiset after cell is processed (initially let ). Let denote the subset of unprocessed cells of . Hence, .
For each cell , define
Define as the subset of cells with . Let . If , then we define as the cell of that minimizes the value ; otherwise define as the cell of that minimizes . Note that the above way of defining is a key difference from Chan’s approach [7], where is chosen from randomly. We now process in the following three steps (which is similar to Chan’s approach).
-
1.
Construct a -cutting for inside with
for some constant . By the cutting lemma, the number of subcells inside is , which can be made at most for a sufficiently small .
-
2.
We further subdivide each subcell of (e.g., using vertical cuts) so that each subcell contains at most points of . The number of extra cuts is as contains at most points of . The total number of extra cuts for processing all cells of is at most , and thus the total number of subcells after processing all cells is at most , which is at most for .
-
3.
For each distinct hyperplane , multiply the multiplicity of in by , where is the number of subcells of crossed by . Let be the resulting multiset after this.
In the third step, since each of the subcells of is crossed by at most hyperplanes of , we have . Since , we have
We thus obtain
After all cells of are processed, we have a multiset . Recall that and . According to the above analysis, we have
| (2) |
The following lemma gives an upper bound for .
Lemma 3.
.
| (3) |
For any hyperplane , let be the total number of subcells crossed by . Our goal is to prove that is bounded by (1). By the way is produced, we have . Hence,
This proves Lemma 2.
3.2 Constructing the partition tree
Using Lemma 2, we can construct a partition tree in the following theorem.
Theorem 4.
(Hierarchical Partition Theorem) Given a set of points in and a parameter with for a sufficiently large constant , for for a sufficiently large constant , there exists a sequence with and , where each is a collection of disjoint simplicial cells (i.e., each cell is a simplex) whose union covers with the following properties:
-
1.
has only one cell that is and the number of cells of is for (in particular, has cells; the total number of cells in all collections is also ).
-
2.
Each cell of contains at least one point and at most points of for all (in particular, each cell of contains at most points).
-
3.
Each cell in is contained in a single cell of .
-
4.
Each cell of contains cells of for and the cell of contains cells of .
-
5.
Any hyperplane crosses at most cells of for all .
All collections of cells can be constructed in time.
Proof.
We only sketch the proof here. The details can be found in the full paper.
Let be a set of hyperplanes in . We can prove the following critical lemma: With respect to , one can construct a sequence as stated in the theorem with the same properties except that Property (5) only works for any hyperplane in ; also, the construction time is bounded by .
To prove the above lemma, we first assume that is a power of . Then, and . We apply Lemma 2 with to construct the collections iteratively for until . More specifically, consists of a single cell that is , and is obtained by applying Lemma 2 on all cells of . We then let and . By Lemma 2, the properties (1)-(4) hold. The proof for Property (5) as well as the case where is not a power of are given in the full paper. The algorithm implementation for constructing all collections of cells is also presented in the full paper.
The above critical lemma only guarantees the crossing numbers for the hyperplanes in . To make it work for any hyperplane in , we resort to the following lemma, which was known before, e.g., [7].
Lemma 5.
(Test Set Lemma [7]) For a set of points in , we can compute in time a set (called test set) of hyperplanes with the following property: for any set of cells each containing at least one point of , if is the maximum number of cells crossed by any hyperplane of , then the maximum number of cells crossed by any hyperplane is .
Now to prove Theorem 4, we just apply the above critical lemma with as the test set given in Lemma 5, which can be computed in time. Since and , we obtain the theorem from the above critical lemma.
The collections of cells of Theorem 4 naturally form a tree structure, called a hierarchical partition tree, in which each node corresponds to a cell. Specifically, the only cell in is the root. Cells of are the leaves. If a cell contains another cell , then is the parent of and is a child of . As such, each node of the tree has children (the root has children). Although the construction time of Theorem 4 is large, it can usually be reduced (e.g., to time) in applications by constructing an “upper” partition tree using other techniques (e.g., [17]) and then processing the leaves (each containing a small number of points) using Theorem 4, as will be demonstrated in the subsequent sections.
4 Simplex range counting
Given a set of points in , we wish to construct a data structure so that the number of points of inside a query simplex can be quickly computed. If we set in Theorem 4, we can have a data structure of space and query time, which is not as . In the following, we reduce the query time to .
For a region in the plane, let denote the subset of points of in .
Setting with , we apply Theorem 4 to obtain a partition tree with collections , . The total number of cells is . For each cell , we store . By Theorem 4, each cell in contains points of . The runtime to construct is .
Given a query simplex , starting from the root of , for each cell of , if is contained in , then we add to a total count. If is completely outside , then we ignore it. Otherwise, a bounding halfplane of must cross and we proceed on all children of unless is a leaf. In this way, we obtain a set of leaf cells that are crossed by the bounding halfplanes of . The runtime is , which is .
We now build the data structure recursively on the points in for each leaf cell of . If is the query time, we have the following recurrence relation:
| (4) |
where is the query time for each leaf cell , which contains points of .
To solve , we preprocess for each leaf cell recursively as above by using different parameters. Specifically, let , which is . Let be an arbitrarily small constant to be set later. Setting with , we apply Theorem 4 to construct a partition tree in time. The total time for constructing for all leaf cells of is bounded by . Using to handle queries on and following the same analysis as above, we obtain
| (5) |
where is the query time for each leaf of , which contains points of .
In summary, the above first builds a partition tree and then builds a partition tree for each leaf of (so there are two recursive steps but with different parameters). For notational convenience, we still use to refer to the entire tree (by attaching for all leaves ), which has leaves, each containing points. The total space is bounded by since there are only two recursive steps. In Section 4.1, by using the property that is very small, we show that after space and time preprocessing, each simplex range counting query on any leaf cell of can be answered in time (i.e., , which is ; we consider the query on each leaf cell a subproblem). As such, we obtain that . The total preprocessing time is , dominated by the time for constructing . We thus have the following result.
Lemma 6.
Given a set of points in , there is a data structure of space that can compute the number of points of in any query simplex in time. The data structure can be built in time.
We will reduce the preprocessing time to in Section 4.2.
4.1 Solving the subproblems
We first consider halfspace queries and then extend the techniques to the simplex case.
A basic data structure.
Let be a set of points in . We first build a straightforward data structure (called a basic data structure) of space in time that can answer each halfspace range counting query on in time.
Let be the set of dual hyperplanes of . We compute the arrangement of in time and space [13]. We then build a point location data structure on , which can be done in time and supports -time point location queries [11]. In addition, for each face of , we compute the number of hyperplanes above (resp., below ) and store these two numbers at , e.g., by checking every hyperplane of . This finishes our preprocessing, which can be easily done in time and space. The preprocessing time can be reduced to , e.g., by taking an Eulerian tour of the dual graph of the arrangement.
Given a query halfspace , the goal is to compute the number of points of inside . Without loss of generality, we assume that is an upper halfspace. Then, it is equivalent to computing the number of hyperplanes of below , where is the dual point of the bounding hyperplane of . Using the point location data structure, we find the face of that contains ; stores the number of hyperplanes of below it and we return that number as our answer to the query. The query time is thus .
Handling halfspace queries.
Next, we show that after time and space preprorcessing we can answer each halfspace range counting query in time on for each leaf cell of our partition tree .
We build an algebraic decision tree for the arrangement construction algorithm [13] on a set of hyperplanes in so that each node of corresponds to a comparison in the algorithm. The height of is and has leaves. Each leaf of corresponds to a “configuration” of hyperplanes in the following sense. Let be a set of hyperplanes with indices . Following in a top-down manner, we can reach a leaf such that all comparisons of the nodes in the path of from the root to are consistent with ; we say that has the same configuration as . Let and are two sets of hyperplanes each. Let and be the arrangements of and , respectively. If the configurations of and are both the same as a leaf of , then there is a one-to-one correspondence between faces of and faces of such that if a face of corresponds to a face of , then the -th hyperplane of is above if and only if the -th hyperplane of is above .
By the above observation, we do the following preprocessing. For each leaf of , let be a set of hyperplanes whose configuration corresponds to (note that the sets ’s for all leaves can be constructed in time by following , which basically enumuerates all possible configurations for the arrangements of a set of hyperplanes). We construct the above basic data structure on , denoted by . Doing this for all leaves of takes time and space, which is since if we choose a small enough .
For each leaf cell of , recall (if , we add dummy points to so that has exactly points). Let be the set of dual hyperplanes of the points of . We arbitrarily assign indices to the hyperplanes of . Following , we find the leaf of that has the same configuration as , which can be done in time linear in the height of , i.e., ; we associate with . Since has leaves, doing this for all leaves of takes time, which is if is small enough.
Given a query halfspace , suppose we want to compute the number of points of inside for a leaf cell of . This can be done in time as follows. Let be the leaf of associated with . Let be the dual point of the bounding hyperplane of . Without loss of generality, we assume that is an upper halfspace. Hence it is equivalent to finding the number of hyperplanes of below . We apply the point location query algorithm using the data structure with , but whenever the algorithm attempts to use the -th hyperplane of to make a comparison, we use the -th hyperplane of instead. The point location algorithm will eventually return a face, which stores the number of hyperplanes below it; we return that number as our answer to the query of .
Handling simplex queries.
We can easily extend the above idea to simplex queries, by using a multilevel data structure. The details are given in the full paper. In summary, with space and time preprocessing, a simplex range counting query on for any leaf cell of can be answered in time.
4.2 Reducing the preprocessing time
We now reduce the preprocessing time of Lemma 6 to . The idea is to build an “upper partition tree” of depth using Matoušek’s method [17] so that each leaf has points of for a small constant and then construct our data structure in Lemma 6 on each leaf (which form the “lower” part of the partition tree). The details are given below.
A simplicial partition for is a collection , where the ’s are pairwise disjoint subsets forming a partition of , and each is a simplex containing all points of . The subsets ’s are called the classes and the simplices ’s are called cells, which may overlap. The crossing number of is the maximum number of cells crossed by any hyperplane. The following result is from [17].
Lemma 7.
([17]) For a set of points in and a parameter for any constant , a simplicial partition whose classes satisfy and whose crossing number is can be constructed in time.
We build a partition tree by Lemma 7 recursively, until we obtain a partition of into subsets of sizes for a small enough constant to be fixed later, which form the leaves of . Each inner node of corresponds to a subset of as well as a simplicial partition of , which form the children of . At each child of , we store the cell of containing and also store the cardinality . The simplicial partition is constructed using Lemma 7 with for a constant integer to be fixed later, i.e., every internal node of has children. If we recurse times, i.e., the depth of is , then the subset size of each leaf of is . Hence, the number of leaves of is . Next, for each leaf of , we construct the data structure of Lemma 6 on , denoted by , in time. Since , we can make small enough so that the total time of Lemma 6 on all leaves of is . This finishes the preprocessing, which takes time. The space of the data structure is since the depth of is .
Given a query simplex , starting from the root of , for each node , we check whether contains the cell . If yes, we add to the total count. Otherwise, if the boundary of crosses , we proceed to the children of . In this way, we reach a set of leaves of whose cells are crossed by the bounding hyperplanes of . Since the number of leaves of is and the depth of is , which is a constant, the size of is . If is the query time for a subset of size , then we have the following recurrence
Starting with and recursing times (using the same ) gives us
Using , by setting to a constant integer larger than , we have
| (7) |
for another small constant .
Finally, for each leaf node (i.e., those subproblems in (7)), we use the data structure to compute the number of points of in , in time by Lemma 6, which is as . Plugging this into (7) gives us . We thus conclude as follows.
Theorem 8.
Given a set of points in , there is a data structure of space that can compute the number of points in any query simplex in time. The data structure can be built in time for any .
5 Simplex range stabbing and segment intersection searching
Given a set of simplices in , the problem is to construct a data structure to compute the number of simplices that contain a query point (we also say that those simplices are stabbed by the query point). In their data structure, Chan and Zheng [9] utilized a randomized hierarchical partition. We follow their approach and instead use our deterministic hierarchical partition in Theorem 4. Extra effort needs to be taken because the degree of the partition tree in [7] is while ours is logarithmic. In the following, we first briefly review Chan and Zheng’s approach [9] and then discuss how to make changes.
A review of Chan and Zheng’s randomized approach [9].
Each simplex is bounded by hyperplanes. A point stabs a simplex if is in the “correct” side of every bounding hyperplane of (to simplify the discussion, we assume these are lower halfplanes). In the dual space, this is equivalent to having the dual hyperplane of above the dual point of each bounding hyperplane of . Therefore, we have the following problem in the dual space. Given a set of -tuples of points in , we wish to construct a data structure to compute the number of tuples whose points are all below a query hyperplane . We can solve the problem using a multi-level data structure as follows.
Let be the set of the -th points of all tuples of . Apply the simplicial partition of Lemma 7 to to obtain a partition , with a parameter to be fixed later. Given a query hyperplane , for every cell of that is crossed by , we recurse on the subset of tuples whose -th points are in (i.e., apply Lemma 7 on recursively). For each cell completely below , we recurse on the subset of tuples whose -th points are in but as a level- problem (the original problem is a level- problem; the recurrence stops at a level- problem in which we only need to return the size of the subset). As analyzed in [7], choosing with as the global input size (i.e., value is fixed for all levels of the recursion and this makes the recursion depth ) can obtain an space data structure with query time, for any . The preprocessing time is . We summarize this (deterministic) result in the lemma below.
Lemma 9.
([9]) Given a set of simplices in , one can build a data structure of space in time so that the number of simplices containing a query point can be computed in time, for any .
To improve the query time, Chan and Zheng [9] reduced the problem to a special case where the first points in each tuple of all lie in (equivalently, in the primal setting all but one bounding halfplanes of each input simplex are parallel to the -th axis). As such, we can reduce some subproblems in the recurrence to the -dimensional problem and then solve these subproblems by Lemma 9 (where becomes ). If (resp., ) is the query time for the problem in (resp., ), then we have the following recurrence:
| (8) |
By Lemma 9, . If we choose for a small constant , the recursion depth is and thus the space is . The query time is due to a constant-factor blowup. To improve the query time to , a randomized hierarchical partition was used in [9] to replace all recursive simplicial partitions in the preprocessing algorithm for Lemma 9. Here to achieve a deterministic result, we instead use our deterministic hierarchical partition in Theorem 4, as follows.
Our deterministic result.
With for a sufficiently large constant and , we apply Theorem 4 to to obtain a partition tree consisting of , . By Theorem 4, has cells and each cell has at most points of . We define a sequence of numbers as follows. For each , , define (and thus ), with and
where is the smallest integer so that . Hence, , which is . Note that always holds. By definition, and is equal to within a factor of for .
We replace the recursive partitions in the preprocessing algorithm of Lemma 9 with this sequence of partitions of : . As , the space of the data structure is still . The cells of the last partition will be further preprocessed later. The following lemma analyzes the query time.
Lemma 10.
The query time excluding the time spent on the cells of is bounded by .
Lemma 10 leads to the following.
Corollary 11.
For a sufficiently large constant , the query time excluding the time spent on the cells of is bounded by .
Proof.
By Lemma 10, it suffices to show that , which is equivalent to . This in turn is equivalent to , which holds for a sufficiently large since .
Preprocessing cells of .
Recall that has cells each containing points of , and . Let . By Corollary 11, we obtain the following recurrence on the query time : , where is the query time for each subproblem of size since each cell of contains points of . As and , we can write
| (9) |
with .
For notational convenience, let refer to the sequence of partitions , and cells of form the leaves of .
To solve in (9), we preprocess for each leaf cell recursively as above by using different parameters. Specifically, let be an arbitrarily small constant to be set later. Setting , we apply Theorem 4 to construct a partition tree with in time. As above (using the new parameters and ), we only use a subset of partitions of in our query data structure. For notational convenience, we use to refer to those partitions. The total time for constructing for all leaves is bounded by and the total space is still .
With the same analysis as Lemma 10, we can obtain that the query time on excluding the time spent on the leaf cells of is . Define with respect to in the same way as above for with respect to , i.e., has leaf cells and each cell contains points of . As above, holds. Consequently, we obtain , where is the query time for each leaf cell of . Let . We have
| (10) |
Combining (9) and (10) leads to:
| (11) |
Since and , we have . Therefore, we further obtain from the above
| (12) |
Since and , we have . Therefore, we can write for another arbitrarily small constant .
In summary, the above first builds a partition tree and then builds partition trees for all leaf cells of (using different parameters). For notational convenience, we still use to refer to the entire tree (i.e., attaching all leaf trees to ), which has leaves, each containing points. The space is bounded by and the total preprocessing time is . Since is tiny, we show in the full paper that after additional time and space preprocessing, each query on any leaf cell of can be answered in time (i.e., ). Consequently, the query time is bounded by . We can reduce the preprocessing time to using idea similar to Section 4.2 (i.e., build an “upper partition tree” of O(1) depth on top using simplicial partitions [17]). We conclude with the following theorem.
Theorem 12.
Given a set of simplices in , one can build a data structure of space so that the number of simplices containing a query point can be computed in time. The preprocessing time is for any .
5.1 Other related problems
As studied in [9], some related problems (as stated in the following two theorems) can be solved by similar techniques. For each problem, we basically follow the same high-level algorithmic framework as [9] but use our deterministic partition tree instead of the randomized one in [7]; this is very similar to the above simplex stabbing counting problem, so we omit these details. For each problem, however, we still need to come up with a method to answer queries in time for subproblems of small sizes (for an arbitrarily small constant ) and these are discussed in the full paper.
Theorem 13.
Given a set of simplices in , one can build a data structure of space so that the simplices containing a query point can be reported in time, where is the output size. The preprocessing time is for any .
Theorem 14.
Given a set of segments in the plane, one can build a data structure of space so that the number of segments intersecting a query segment can be computed in time (these segments can be reported in additional time, where is the output size). The preprocessing time is for any .
6 Segment intersection detection
Let be a set of line segments in . We wish to build a data structure to decide whether a query line intersects any segment of . The problem can be solved by Theorem 14. This section presents a new method that only needs space while the query time is the same.
Let be the set of endpoints of the segments of . With and , we apply Theorem 4 to to obtain a partition tree consisting of collections , . By Theorem 4, the total number of cells is , each cell in contains at most points of , and the runtime to construct is . For each node , let denote the corresponding cell of , which is a triangle in .
The following is a result from the previous work [22] for a special case of the problem. We will use the result as a subroutine in our approach.
Lemma 15.
([22]) If all segments of intersect a given line segment, then one can build a data structure of space in time so that whether a query line intersects any segment of can be determined in time.
We store the segments of in the partition tree as follows (the idea is similar to [22]). For each segment , starting from the root, for each node whose cell contains (which is true initially when is the root), if is a leaf, then we store at (let denote the set of all such segments stored at ). Otherwise, we check every child of . If has a child whose cell contains , then we proceed on . Otherwise, for each child , if contains an endpoint of , then since does not contain , must intersect an edge of ; we store at (let denote the set of segments stored at ); note that since has two endpoints, there are two such edges but it suffices to store in one such edge. This finishes the algorithm for storing , which takes time. Because is stored at either a leaf or a cell edge, the total space for storing all segments is . The time is .
Next, for each edge of each cell of , since all segments of intersect e, we preprocess using Lemma 15. Doing this for all cell edges of takes time and space. For those segments stored in for all leaves , we will preprocess them into a data structure . Before discussing , we describe the query algorithm and analyze the time complexity.
Given a query line , starting from the root of , for each node , assume that intersects the boundary of , which is true initially when is the root. If is a leaf, then we call the data structure to check whether intersects a segment of . Otherwise, for each child of , for each edge of , we apply the query algorithm of Lemma 15 to check whether intersects a segment of ; further, if crosses , then we proceed on . The following lemma justifies the correctness of the query algorithm.
Lemma 16.
The query algorithm works correctly.
Proof.
If the query algorithm detects an intersection, then it is obviously true that intersects a segment of . On the other hand, suppose intersects a segment , say, at a point . We argue that the query algorithm must detect an intersection. Indeed, according to our query algorithm, all nodes of whose cells are crossed by will be processed. If is stored at a leaf , then is contained in . Since intersects , must cross , and thus must be processed and the data structure will detect an intersection between and .
If is not stored at a leaf, then there must exist an internal node such that and is not in for any child of . According to our preprocessing algorithm, must be stored in for an edge of some cell of a child of . Since and , must cross . Therefore, our query algorithm will process by applying the query algorithm of Lemma 15 on for every edge of every child cell of . When it is applied to , the intersection will be detected.
We now analyze the query time. Recall that has leaves. By Theorem 4, the total number internal nodes of whose cell boundaries are crossed by is , and for each such node, we need to call the query algorithm of Lemma 15 times and each call takes time. As such, the query time other than the time spent on calling for those leaves whose cell boundaries are crossed by (let be the set of all such cells) is since and . By Theorem 4, and for each leaf of .
Let be the query time. Following the above analysis, we obtain the following
| (13) |
where is the query time for each leaf , whose cell contains points of .
To solve , we recursively process for each leaf cell of as above by using different parameters. Specifically, let be the number of endpoints of ; hence . Let be an arbitrarily small constant to be set later. Setting , we apply Theorem 4 to construct a partition tree with in time. The total time for constructing for all leaf cells of is thus bounded by . Using to handle queries on and following the above analysis, we obtain
| (14) |
where is the query time for each leaf cell of , which contains points of .
In summary, the above first builds a partition tree and then builds partition trees for all leaves of (using different parameters). For notational convenience, we use to refer to the entire tree (by attaching all trees to ), which has leaves, each containing points. The space is bounded by . As is small, with additional time and space preprocessing, each subproblem in (15) can be solved in time. This makes the total query time bounded by . We can also reduce the preprocesing time to . See the full paper for the details.
Theorem 17.
Given a set of segments in the plane, there is a data structure of space that can determine whether a query line intersects any segment in time. The data structure can be built in time for any .
7 Ray-shooting among non-intersecting segments
Let be a set of line segments in the plane such that no two segments intersect. The problem is to build a data structure to compute the first segment hit by a query ray.
The following is a result from the previous work [22] for a special case of the problem. We will use it as a subroutine in our approach.
Lemma 18.
([22]) If all segments of intersect a given line segment, then one can build a data structure of space in time so that a ray-shooting query can be answered in time.
We build the partition tree and store in in a way similar to the segment intersection detection problem in Section 6 except that we use Lemma 18 to preprocess for each cell edge of . In addition, for each leaf of , we will build a data structure on .
Given a query ray , starting from the root of , for each node , assume that intersects the boundary of , which is true initially when is the root. If is a leaf, then we use the data structure to find the first segment of hit by as our candidate solution segment. Otherwise, for each child of , for each edge of , apply the query algorithm of Lemma 18 to find the first ray of hit by as a candidate; further, if crosses , then we proceed on . Finally, among all candidate segments, we return the one whose intersection with is closest to the origin of . The correctness follows a similar argument as Lemma 16.
As in Section 6, for each leaf of , we construct the data structure by preprocessing recursively once. The query time analysis follows exactly the same method as in Section 6 since the query time of Lemma 18 is the same as that of Lemma 15, and thus we can also obtain the recurrence (15) with the same value of . As is small, with additional time and space preprocessing, each subproblem in (15) can be solved in time. This makes the total query time bounded by . We thus have the following result.
Lemma 19.
Given a set of segments in the plane, there is a data structure of space that can compute the first segment hit by a query ray in time. The data structure can be built in time.
As before, we can reduce the preprocesing time to . See the full paper for the details.
Theorem 20.
Given a set of segments in the plane, there is a data structure of space that can compute the first segment hit by a query ray in time. The data structure can be built in time for any .
References
- [1] Pankaj K. Agarwal. Range searching, in Handbook of Discrete and Computational Geometry, C.D. Tóth, J. O’Rourke, and J.E. Goodman (eds.), pages 1057–1092. CRC Press, 3rd edition, 2017.
- [2] Pankaj K. Agarwal. Simplex range searching and its variants: a review. In A Journey Through Discrete Mathematics, pages 1–30. Springer, 2017. doi:10.1007/978-3-319-44479-6_1.
- [3] Pankaj K. Agarwal and Jir̆í Matoušek. Ray shooting and parametric search. SIAM Journal on Computing, 22(4):794–806, 1993. doi:10.1137/0222051.
- [4] Pankaj K. Agarwal and Micha Sharir. Applications of a new space-partitioning technique. Discrete and Computational Geometry, 9:11–38, 1993. doi:10.1007/BF02189304.
- [5] Pankaj K. Agarwal and Micha Sharir. Pseudoline arrangements: Duality, algorithms, and applications. SIAM Journal on Computing, 34:526–552, 2005. doi:10.1137/S0097539703433900.
- [6] Reuven Bar-Yehuda and Sergio Fogel. Variations on ray shootings. Algorithmica, 11:133–145, 1994. doi:10.1007/BF01182772.
- [7] Timothy M. Chan. Optimal partition trees. Discrete and Computational Geometry, 47:661–690, 2012. doi:10.1145/1810959.1810961.
- [8] Timothy M. Chan and Da Wei Zheng. Hopcroft’s problem, log-star shaving, 2D fractional cascading, and decision trees. ACM Transactions on Algorithms, 2023. doi:10.1145/3591357.
- [9] Timothy M. Chan and Da Wei Zheng. Simplex range searching revisited: How to shave logs in multi-level data structures. In Proceedings of the 34th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1493–1511, 2023. doi:10.1137/1.9781611977554.ch54.
- [10] Bernard Chazelle. Lower bounds on the complexity of polytope range searching. Journal of the American Mathematical Society, 2(4):637–666, 1989. doi:10.2307/1990891.
- [11] Bernard Chazelle. Cutting hyperplanes for divide-and-conquer. Discrete and Computational Geometry, 9(2):145–158, 1993. doi:10.1007/BF02189314.
- [12] Siu Wing Cheng and Ravi Janardan. Algorithms for ray-shooting and intersection searching. Journal of Algorithms, 13:670–692, 1992. doi:10.1016/0196-6774(92)90062-H.
- [13] Herbert Edelsbrunner, J. O’Rourke, and Raimund Seidel. Constructing arrangements of lines and hyperplanes with applications. SIAM Journal on Computing, 15:341–363, 1986. doi:10.1137/0215024.
- [14] Herbert Edelsbrunner and Emo Welzl. Halfplanar range search in linear space and query time. Information Processing Letters, 23:289–293, 1986. doi:10.1016/0020-0190(86)90088-8.
- [15] Leonidas J. Guibas, Mark H. Overmars, and Micha Sharir. Intersecting line segments, ray shooting, and other applications of geometric partitioning techniques. In Proceedings of the 1st Scandinavian Workshop on Algorithm Theory (SWAT), pages 64–73, 1988. doi:10.1007/3-540-19487-8_7.
- [16] David Haussler and Emo Welzl. -nets and simplex range queries. Discrete and Computational Geometry, 2:127–151, 1987. doi:10.1007/BF02187876.
- [17] Jir̆í Matoušek. Efficient partition trees. Discrete and Computational Geometry, 8(3):315–334, 1992. doi:10.1007/BF02293051.
- [18] Jir̆í Matoušek. Range searching with efficient hierarchical cuttings. Discrete and Computational Geometry, 10(1):157–182, 1993. doi:10.1007/BF02573972.
- [19] Jiří Matoušek. Geometric range searching. ACM Computing Survey, 26:421–461, 1994. doi:10.1145/197405.197408.
- [20] Mark H. Overmars, Haijo Schipper, and Micha Sharir. Storing line segments in partition trees. BIT Numerical Mathematics, 30:385–403, 1990. doi:10.1007/BF01931656.
- [21] Haitao Wang. Unit-disk range searching and applications. Journal of Computational Geometry, 14:343–394, 2023. doi:10.20382/jocg.v14i1a13.
- [22] Haitao Wang. Algorithms for subpath convex hull queries and ray-shooting among segments. SIAM Journal on Computing, 53:1132–1161, 2024. doi:10.1137/21M145118X.
- [23] Dan E. Willard. Polygon retrieval. SIAM Journal on Computing, 11:149–165, 1982. doi:10.1137/0211012.
- [24] F. Frances Yao. A 3-space partition and its applications. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC), pages 258–263, 1983. doi:10.1145/800061.808755.
- [25] F. Frances Yao, David P. Dobkin, Herbert Edelsbrunner, and Mike Paterson. Partitioning space for range queries. SIAM Journal on Computing, 18:371–384, 1989. doi:10.1137/0218025.
