A Dimension-Reducing Fréchet Simplification Oracle
Abstract
Let be a polygonal curve with vertices in the plane. We construct a data structure of size suited for simplification queries of the following kind. Given a query line and an integer , find a curve on with at most vertices that minimizes the discrete Fréchet distance to , among all such curves. Using our data structure, a query can be handled in time.
More generally, a geometric tree on vertices in the plane can be preprocessed into a near-linear-size structure so that, given a pair , of its vertices, a line , and an integer , one can find a curve on with at most vertices that minimizes the discrete Fréchet distance to the path from to in , in time .
For the general dimension-reduction problem, where is a curve in (), is a real parameter, and a query specifies a -flat () and an integer , we construct a data structure of size , where , that allows us to find a curve on with at most vertices, whose discrete Fréchet distance to is at most times the distance of to , where is such a curve that minimizes the distance to . The query handling time is .
Keywords and phrases:
Computational geometry, discrete Fréchet distance, curve simplification oracle, restricted minimum enclosing disk queriesFunding:
Boris Aronov: Work partially supported by NSF Grant CCF-20-08551.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Computational geometryEditors:
Ho-Lin Chen, Wing-Kai Hon, and Meng-Tsung TsaiSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Curve simplification is an important and extensively studied problem in computational geometry and related fields. Here, we restrict our attention to curve simplification with the quality of the simplification measured using the discrete Fréchet distance between the original curve and its simplification. See below for a comprehensive survey of research in this area.
In this paper, we introduce a novel class of curve simplification problems, in which the simplified curve is required to lie within a specified subspace or region. More formally, let be a polygonal curve in . We are interested in problems where, in addition to standard parameters such as a threshold on the distance between and the desired simplification or a bound on the length (i.e., number of vertices) of , must also reside in a designated subspace or region. Moreover, since it is often desirable to find the optimal simplification of for multiple subspaces or regions, we concentrate on the multi-shot formulation of these problems.
These questions can also be viewed as dimension-reduction problems. That is, we wish to replace an input -dimensional curve by a curve in some lower-dimensional space, while minimizing the Fréchet distance to the input curve.
Specifically, we focus on the following problem. Let be a polygonal curve in . Preprocess into a compact data structure to efficiently support queries that specify a -flat111That is, a -dimensional flat – a -dimensional affine subspace of . () and an integer , and request a curve on of length at most that minimizes the discrete Fréchet distance to . For , the -flat is simply a line in the plane, and we present an efficient solution to the problem, as well as to the more general problem in which the input is a geometric tree rather than a curve . In the latter problem, a query also specifies the path in the tree to which the query applies. For , we present an efficient solution that provides approximate answers to queries, that is, given and , it returns a curve on with at most -vertices which is approximately the best such curve on , in terms of its discrete Fréchet distance to .
Basic definitions.
In this paper we view a finite sequence of points in as a polygonal curve; we refer to as the size or length of and the points as its vertices.
Let and be polygonal curves in . A legal walk of and (or just walk, for short) is a monotonically increasing sequence of pairs , where , , and, in general, when advancing from , is one of the following: (provided ), (provided ), or (provided and ). The cost of a walk is the maximum distance among the distances between the vertices of each pair in the walk. The discrete Fréchet distance between and is the cost of the minimum-cost walk of and .
Given a -flat , , we say that is a -vertex curve on if is a sequence of points on . An at-most--vertex curve on is a -vertex curve on with .
Main problem statement.
In this paper we study the following problem. Given a curve in as above, preprocess it into a near-linear size data structure to efficiently support queries of the following type: “Given a -flat , , and integer , find an at-most--vertex curve on minimizing .” We first focus on the planar version, where is a curve in the plane and queries are lines. See Figure 1 for an example.
We also consider the more general setting, still in the plane, where the input is a geometric tree rather than a curve , and the queries are of the form: “Given two nodes of the tree, a line , and integer , find an at-most--vertex curve on minimizing the discrete Fréchet distance between and the path from to in .”
In other words, we first study problems in which we need to preprocess a given curve (or tree ) in the plane, so that one can efficiently handle simplification queries with respect to a query line. The maximum length of the desired simplification is also a parameter of the query.
Next, we further generalize the problem so that a query may specify an arbitrary polygonal region in which must reside; query time will of course depend on the complexity of the region.
Finally, we consider the problem in three and higher dimensions, where is a curve in , , and queries are -flats, . Since we are interested in a near-linear size data structure, we resort to approximate queries (see discussion in Section 6), with a prespecified error parameter . That is, the answer to a query is an at-most--vertex curve on , such that , where is such a curve minimizing the distance to .
Our results.
We present efficient solutions to these problems. Specifically, given or in the plane, we construct a data structure of size that allows us to answer a query in time; refer to Theorem 7 in Section 3 and Theorem 8 in Section 4 for the exact statements. As an intermediate result, we present an algorithm for the corresponding decision problem (with respect to a curve or tree ). For example, given a line and distance , we can determine in time whether there exists an at-most--vertex curve on within discrete Fréchet distance at most from , and if so, return such a curve with fewest vertices. In Section 5 we extend the data structure so that a query may specify a polygonal region in the plane that restricts the location of the curve .
In higher dimensions, given in and , we construct a data structure of size , where , that allows us to answer a query approximately in time, that is, the discrete Fréchet distance between and the curve that we return is at most times the distance between and the optimal curve; refer to Theorem 12 in Section 6 for the exact statement. Our solution uses coresets.
Related work on curve simplification.
We focus here on (global) simplification under the Fréchet distance in the plane, where the quality of a simplification is the Fréchet distance between the original curve and its simplification .
There are two main versions of the simplification problem. In the first, we are given a threshold and the goal is to compute a curve of minimum length, such that the distance between and is at most . In the second, we are given an integer and the goal is to compute a curve of length at most that minimizes the distance to . Moreover, each of the two main versions gives rise to four standard variants, depending on whether we are using the continuous or discrete Fréchet distance, and whether is restricted (i.e., a subsequence of ) or unrestricted (i.e., any sequence of points); Van de Kerkhof et al. [18] also consider an intermediate curve-restricted option.
The following results are for the first version under the continuous Fréchet distance. Bringmann and Chaudhury [7] presented an algorithm with running time for restricted simplification and established a conditional cubic lower bound. An algorithm with the same running time was also proposed by Van de Kerkhof et al. [18] (see also Van Kreveld et al. [19]). For unrestricted simplification, Guibas et al. [12] provide an algorithm with running time , and Agarwal et al. [2] provide an algorithm, which returns a simplification of length at most the length of an optimal simplification with threshold . The latter two statements can be deduced from the corresponding papers, as observed by Van de Kerkhof et al. [18].
Bereg et al. [4] studied both restricted and unrestricted simplification for each of the two main versions, under the discrete Fréchet distance in three-dimensional space. For the first version, they compute an optimal unrestricted simplification in time and an optimal restricted simplification in time. For the second version, they compute an optimal unrestricted simplification in time, where is the length of the computed simplification, and an optimal restricted simplification in time. Also under the discrete Fréchet distance, Fan et al. [10] considered the so called chain-pair simplification problem in the plane, where the goal is to simultaneously compute restricted simplifications of length for two input curves, such that the distance between the simplifications themselves is also below some threshold.
Finally, under the continuous Fréchet distance, Driemel and Har-Peled [8] presented a near-linear time algorithm that computes a permutation of the vertices of , such that any prefix of vertices of this permutation is an approximation (up to a constant factor) of compared to any polygonal curve with vertices. Subsequently, Filtser [11] slightly improved the approximation factor under the discrete Fréchet distance.
More related work.
A series of papers [16, 15, 5] culminates in an algorithm that preprocesses an -point set in the plane in linear space and time to support time queries of the form “What is the smallest circle centered on the query line and enclosing ?”
There is also a substantial amount of work on so-called range-aggregate queries. Given an -point set in the plane, we want to preprocess it for queries of the following type: compute some aggregate quantity about the points of contained in an axis-aligned query rectangle. While range counting, reporting, and more general semigroup queries have been widely studied [1], some types of queries are not readily decomposable (in the sense of the answer being easily synthesized from the answer for smaller queries). For example, Brass et al. [6] discuss computing the area (or perimeter) of the convex hull, width, and smallest enclosing disk of the points of in the query rectangle. They describe algorithms for preprocessing into a data structure of near-linear size with polylogarithmic-time queries, for all of the above queries except for the width; the width data structure is more expensive. Gupta et al. [13] explore several variants of closest-pair-in-range problems. For example, they show a near-linear-space structure for computing the closest pair of points within the query rectangle in two dimensions with polylogarithmic query time. Our observation in Remark 4 is a further generalization in this direction: Given a point set in the plane, we can preprocess it so that, given a query rectangle and a query line, we can determine the smallest enclosing circle of the points in the rectangle, among all circles centered on the line.
Overview and organization.
We first consider the planar version of the dimension-reduction problem. To answer a decision query of the form “given a line , an integer , and a distance , determine whether there exists an at-most--vertex curve on within Fréchet distance at most from ”, we need to repeatedly perform the primitive operation for different values of : Find the longest (contiguous) subcurve of , beginning at , that fits into a disk of radius centered at a point of . In Section 2, we describe our data structure for the planar version and then use it to implement in time. Our data structure is obtained by combining known data structures in a non-trivial manner. In Section 3, we obtain our first main result, i.e., the solution for the planar version of the dimension-reduction problem. Using the data structure and primitive operations developed in Section 2, we present efficient algorithms for handling both decision queries and “regular” queries, where the algorithm for regular queries is based on the one for decision queries and on the primitive operation (developed in Section 2) which, given a (contiguous) subcurve of , returns the radius of the smallest disk centered at a point of that contains the subcurve. In Section 6, we obtain our second main result, i.e., the solution for the -dimensional version, , of the dimension-reduction problem. The algorithms in this general version are similar to those in the planar version, however, we need to replace the exact primitive operations above by their approximate -dimensional counterparts. These latter operations are implemented using coresets. Finally, in Sections 4 and 5, we extend our result for the planar version in two directions. In Section 4, the input is a geometric tree (rather than a curve ), and queries specify, in addition to and , a path in , by providing its start and end vertices. In Section 5, queries specify a polygonal region rather than a line , to contain the simplified curve.
2 Facts and tools
2.1 Facts
For two points in the plane, let be the Euclidean distance between them; many of our observations generalize to other norms, but in this version we focus on the Euclidean metric, for simplicity of presentation.
For a point set , let be the radius of the minimum enclosing circle (or mec, for short) of , and let be the radius of the mec of centered at . Moreover, let be the radius of the mec of centered at a point of line , and let be the radius of the mec of centered at , as a function of . We give the proof of the following standard fact below, for completeness.
Fact 1.
Both and are convex and have a unique minimum.
Proof.
Since , and is a convex function of , is convex. Therefore is also convex, as a restriction of the convex function to a line.
As for uniqueness of minimum, we prove it for ; it also proves the claim for .
For a contradiction, suppose and are both minima of . Then the disks of radius centered at and , respectively, each cover . Therefore is contained in the lens . Let be a point on the open segment . Then the disk centered at and just covering the lens contains and has a smaller radius, contradicting the assumption that and were minima of .
2.2 Primitives in the plane
We now list the operations that will be useful in solving the two-dimensional problems addressed in this paper. Each of the operations below depends on the query line .
-
: The smallest-radius disk centered at a point of and containing a set of points; let and denote its radius and center, respectively.
-
: The interval of (which may be empty) that is the locus of centers of disks of radius containing .
-
: Is there a disk of radius centered at a point of and containing a set of points? Equivalent to or to ; the former version is sometimes more efficient.
-
: For a sequence of points, its longest prefix (which may be empty or all of ) that fits into a disk of radius centered at a point of . In practice, returns the length of the prefix, so 0 if it is empty.
2.3 Data structures in the plane
We now describe the data structures we employ to implement the above operations. Consider a polygonal curve of length . For clarity of presentation we assume that is a power of two; the general case requires minimal straightforward modifications. We denote by the (contiguous) subcurve , for .
The hierarchical decomposition.
The hierarchical decomposition of is a binary tree, in which the root corresponds to , its left and right children to and , respectively, and so forth. The leaves correspond to the single-vertex curves. Each node of the tree corresponds to a canonical subcurve of (we will use the term “canonical set” when the ordering of the vertices in the subcurve is immaterial). A general subcurve corresponds to the concatenation of canonical subcurves, which can be identified from the value of the indices and in time.
We often build data structures based on the hierarchical decomposition of a curve, where at each node we store an additional structure tailored to its corresponding canonical subcurve.
Farthest-point Voronoi diagram as a map.
For a given set of points in the plane, the farthest-point Voronoi diagram is a planar map decomposing the plane into convex regions , one for each , where is the locus of all points in the plane for which is the farthest point in . is a planar map with vertices, edges, and faces. Each face is a convex unbounded polygon. We preprocess the planar map for point location, so that, given a query point , one can find the region containing , and thereby the farthest point of in , in time.
Centroid decomposition.
Consider a free tree of degree at most three. A centroid edge is an edge of whose removal splits into two trees of almost equal size, with between and of the vertices of . It is known to exist in any degree-at-most-three tree. The existence of a centroid edge naturally induces a centroid decomposition of : the root is a centroid edge . Each subtree left and right of the root is, recursively, a centroid decomposition of the two trees formed by the removal of from . Each internal node corresponds to an edge of . Each leaf corresponds to either a single edge or to two edges of .
Farthest-point Voronoi diagram as a tree.
Given a set in the plane, the 1-skeleton of (i.e., the collection of its vertices and edges, viewed as a graph with virtual vertices at the “ends” of infinite ray edges) is a tree, which, for in general position, has degree three. For simplicity of exposition, we will assume that the points of are in general position; refer to [5] for how to handle degeneracies in when constructing the centroid decomposition.
We now describe what is essentially the centroid-decomposition-based data structure from [5] for computing the smallest disk enclosing a given set of points, with the center on the query line , i.e., . For a fixed set , the centroid decomposition of requires linear space and time to build. In our data structure, for a path , we build the centroid decomposition for all canonical subsets of , which requires overall space and time.
We store the centroid decomposition of together with some additional geometric information, as follows: Each internal node corresponds to an edge of , which is a line segment222Unbounded edges have to be handled slightly differently; we omit the details. separating regions and . Let be the midpoint of . There exist two infinite rays and [5] emanating from , one fully contained in and one in , respectively. Store these two rays together with the centroid edge and points and at the current node of the centroid decomposition tree. Notice that the polygonal line splits the plane into two regions: one contains the subtree corresponding to the left child of and the other – the subtree of its right child.
The full structure .
Let be the data structure constructed as follows: We start with the hierarchical decomposition of . At each node of the decomposition, which corresponds to a canonical subcurve of , we store both in the planar map and centroid decomposition form.
2.4 Implementations
We now describe how to efficiently implement the primitives and analyze their running times.
2.4.1 Implementation of
If is a canonical set of size .
The algorithm in [5] preprocesses in linear space and time to support such an operation in time; see Section 2.3 for a description of the centroid decomposition data structure. Roughly speaking, the search proceeds by descending the centroid decomposition of , narrowing down the portion of the query line containing the disk center, at a constant cost per level. At a leaf of the decomposition, at most three farthest neighbors in remain, which allows to identify the center and radius directly. We use a more elaborate version below to solve the two-set version of the problem.
If is a union of two canonical sets.
Let and be two canonical sets of total size at most , and let be the query line. Put . We wish to compute , the smallest disk with center on enclosing ; let denote its center, to be computed. We will need the following information about the sets and (already stored in ): and viewed both as planar maps, preprocessed for point location, and as planar trees, stored as centroid decompositions, see Section 2.3.
For clarity of explanation, identify with the -axis and let denote a generic point of . The center of the desired disk is the point minimizing . Each of the three functions are convex on , with a unique minimum, by Fact 1.
Note that the following sidedness test can be performed in time: Given a point , we can determine whether lies before, after, or at along . Indeed, after querying with in and , we can determine the distance from to the farthest points of (possibly more than one) and therefore the direction in which this distance decreases along – that’s the direction towards ; if no such direction exists, then . The bottleneck of this operation is the lookup of in the planar map and , at a cost of .
Finally, we describe the computation of . Throughout the search we maintain the current segment that is guaranteed to contain . We initially set .
We now descend the centroid decompositions of and . Without loss of generality, suppose we are currently descending the decomposition of , with the current node corresponding to edge and its two rays splitting the plane into wedges and ; intersects at most twice. At each of the intersection points we perform our sidedness test and eliminate part of as the possible position of . The result is an, in general, smaller updated segment , contained fully in or in and containing . We descend the decomposition of to the corresponding child of .
Before terminating the current step, we check if the updated crosses . If it does, we split at the intersection point, check what side contains by another sidedness test, and shrink further. Notice that the resulting segment is guaranteed not to meet any of the edges of the discarded child of in the centroid decomposition, nor itself. Thus we inductively maintain the invariant that the current segment can only intersect the edges of corresponding to the current subtree in its centroid decomposition.
We continue in this manner, until we reach the bottom of both hierarchies. At a single-edge leaf (a two-edge leaf is handled similarly) of a centroid decomposition of, say, , one edge remains, and we apply the same trick to shrink the segment to only one side of the edge, belonging to, say the Voronoi region of . Similarly, we narrow things down to one site . By construction, now , and intersects no edges of nor of . In particular, for all points on , is the furthest point of and – of . We check one of three possibilities: (a) could be the only point determining : we project orthogonally to and check if its projection lies in . If so, check that is contained in the disk of radius centered at (i.e., that ) – then this is . (b) We check if is the only point defining similarly. Finally, (c) we construct the intersection of the bisector with (or, equivalently, with ). This point is the center of and its radius is . It is easily checked that the processing at the bottom of the two hierarchies takes work once the identity of and is known.
Since the descent of the two logarithmic-height hierarchies takes time for one step (the bottleneck in the sidedness test is consulting a point with known position in one of the FVDs for its furthest point in the other set: in each sidedness test we already know where we are in one diagram, but not in the other), we reach the bottom and the solution in time. We conclude that
Lemma 2.
The operation , where are two canonical sets of total size , can be performed in time.
If is a union of canonical sets.
Lemma 3.
Let be canonical subsets of of total size at most , let their union be , and let be a line. We can compute in time.
Proof.
For each pair , with , we compute by an application of the two-set procedure above and return the largest disk. The runtime bound is immediate, so we focus on correctness.
Since is determined by at most two points of (refer to [16, Observation 1]), it follows that the desired disk is indeed , if one of the defining points comes from and the other from with ; if the two points come from the same set , then for every , . If , we are done. Otherwise, notice that only the largest of the disks , , can be , as any smaller disk would not enclose , by definition of and Fact 1.
Remark 4.
Lemma 3 is interesting in its own right, since it allows us to solve problems such as the following. Let be a set of points in the plane. Construct a data structure of near-linear size to support, in time, queries of the form: Given an axis-aligned rectangle and a line , return the smallest enclosing circle of with center on .
2.4.2 Implementation of
If is a single canonical set.
Let be the desired locus of candidate centers such that ; it is a contiguous interval on , if non-empty, since is convex; see Fact 1. This interval is empty if , consists of a single point if , and is a positive-length interval with two endpoints if . Since is convex and monotone along each of the two half-lines into which is split by , we can binary search along each half-line to determine the endpoints of . Using the same data structure as in [5] allows us to conduct the binary search in time. We omit the easy details.
If is the union of canonical sets .
Compute , for , and return . Running time is dominated by the cost of computing the feasible intervals, so it is .
If is the union of a dynamic collection of canonical sets.
The goal is to maintain the feasibility interval subject to addition and removal of canonical sets. For each set, store its feasible interval and, using two heaps and , maintain the rightmost left endpoint and leftmost right endpoint of the intervals. (i) On insertion of a new set , compute , add to , and to . (ii) On deletion of , remove the endpoints of the stored interval from and . (iii) The current feasible interval is always , unless , in which case it is empty.
If we use a standard binary heap to implement and , insertion costs , deletion , and current feasible region is available in time , where is the current number of sets in and is the total number of points involved.
2.4.3 Implementation of
This test is logically equivalent to or to . The former is more efficient unless is a single canonical set (in which case the latter is more straightforward, but equally efficient asymptotically).
2.4.4 Implementation of
This operation can be implemented as a binary search along , using as a blackbox for candidate prefixes as the decision procedure, at the cost of a multiplicative logarithmic factor in running time. However, sometimes we can do better, as detailed below.
If is a canonical subcurve.
At any moment during a binary search in the hierarchical decomposition of (which itself is a canonical subcurve of ), there are canonical sets comprising the current prefix being tested. As binary search progresses down the hierarchy, either a new canonical set is added to the current prefix (if the prefix is too short), or is replaced by a new, smaller (if the prefix is too long). We use the dynamic structure for feasible interval maintenance from Section 2.4.2 to keep track of this information and check if the current feasible interval is empty. Since the total number of insertions into and deletions from our collection of canonical sets is , the overall cost is .
If is a subcurve of .
Using the hierarchical decomposition of , we express as a concatenation of canonical subcurves . In time, we compute the feasible intervals for all subcurves . Then in time , we compute the largest index such that fits into a disk of radius , but does not, by computing the largest such that (if , returns all of ). We then binary search within to identify the longest prefix of that can be concatenated to and still fit into a disk of radius in time, using a slight modification of the above single-canonical-subcurve algorithm. The overall time complexity is .
3 Discrete Fréchet distance simplification on a line
In this section we present an algorithm for answering queries, which refer to the already preprocessed input curve , of the following type: Given a line and an integer find an at-most--vertex curve on minimizing ; that is, find a curve , where and , that realizes the expression
We assume that , since otherwise the curve clearly achieves the minimum, where is the orthogonal projection of on .
Recall that we write , for , to denote the (contiguous) subcurve of . For a point in the plane, the distance from to the vertex of farthest from (nearest to) it, is denoted by (). Notice that
where is the radius of (see Section 2.2).
Let be the desired curve and let be a walk of and of cost . We may assume that does not match two consecutive points in to the same point in . That is, does not contain two consecutive pairs of the form , since, if it does, we may delete the pair from (and the point from , if it was only matched to ) to obtain a legal walk (and a curve ) of the same cost.
This assumption implies that, in order to compute , we need to find an optimal partition of into subcurves. That is,
We conclude that, given and , our task is to find and a sequence of indices that minimize the expression
The desired sequence is then , where is the center of .
3.1 The Algorithm
Given the supporting data structures and set of primitive operations, the query algorithm is pretty simple. We first present an algorithm for the decision problem, i.e., given , , and , determine if there exists an at-most--vertex curve on such that .
The decision algorithm.
The function Decision in Algorithm 1 uses a simple greedy recursive approach for solving the decision problem.
Lemma 5.
Let be a line in the plane, an integer, and . The call returns true if and only if there exist points on , , such that . The running time is times the cost of a call to with a subcurve of , i.e., .
Proof.
The time bound is obvious. Moreover, it is clear that, if the call returns true, then there exist such points – namely, the centers of the disks corresponding to the computed prefixes. We now prove the other direction by induction on .
If there exists a point on such that , then the call at the top level of the recursion will return and the algorithm will return true just at the beginning of the next level (since ).
Assume now that for any preprocessed curve , if there exist fewer than points on , , such that , then the call (applied to ) returns true. Moreover, assume there exist points on , , such that . We need to show that the call returns true. Indeed, consider a walk of and of cost , and let be the largest index such that matches to . By definition, the index returned by the call at the top level of the recursion is at least as large as . Now, since the cost of is and matches to the prefix , we know that and therefore there exists a sequence of at most points on , which is or a suffix of it, such that . Therefore, by the induction hypothesis, the call at the top level of the recursion (where ) returns true, and thus returns true.
Note.
A minor modification of the Decision function, with the same preprocessing of , can solve the problem of finding, given line and distance , the shortest sequence of points on lying within discrete Fréchet distance of . (No such set exists if is smaller than the maximum distance from a point of to ; this can be efficiently checked with no asymptotic slowdown.) The query can be answered in time , where is the length of and is the length of the shortest such curve . Performing the same query on a subcurve of can be done at the same asymptotic cost.
The optimization algorithm.
We now present the query algorithm. That is, given and , find an at-most--vertex curve on minimizing . The function Optimization in Algorithm 2 actually returns the distance , but it can be easily modified (within the same bounds) to return as well.
Lemma 6.
Let be a line in the plane, and let . The call Optimization() returns the smallest , for which there exists a sequence of at most points on such that . The running time is .
Proof.
If , then the algorithm returns . Indeed, let be the smallest index such that Decision() returns true, so the algorithm sets . But since , the latter inequality must be an equality, so . As for , it is set to either by the recursive call with (if ) or by the else clause (if ). Hence, the algorithm returns as claimed.
Assume now that and that the algorithm returns the correct value for . Let
and let be a sequence of points on such that . Finally, let be a minimum-cost walk of and (i.e., a walk of cost ). Let be the shortest prefix for which Decision() returns true. Then and by definition . Notice that the inductive assumption implies that the value returned by the recursive call (if such a call is needed) is correct. We consider the cases and separately.
If , is feasible. This radius is equal to the distance from to . Notice that, for each point of , the distance between and is a lower bound on , so we must have . Moreover, since , the algorithm sets to and therefore returns the correct value.
Assume therefore that and let . We know that the call Decision() returned false, so . We distinguish between two subcases.
-
If , we claim that the value returned by the recursive call must be greater or equal than , and therefore the algorithm returns the correct value, . Indeed, if , then we found points , where and are the centers induced by the recursive call, such that , contradicting the assumption that .
-
If , then let be the prefix assigned to by the minimum-cost walk . Notice that , since otherwise would be greater or equal than . So (which as we recall is greater than ) is the smallest distance between and a curve on of length at most , while is the smallest distance of either (if ) or a proper suffix of (if ) and such a curve on . Thus, . But, since is feasible, we conclude that and the algorithm returns the correct value (since ).
As for the running time, at each level of the recursion we issue calls to the decision algorithm, where each of them is preceded by a call to , with the appropriate index , to obtain . Since the running time of the decision algorithm is and the running time of is , and the number of levels is , we conclude that the total running time is .
This bound can be slightly improved by splitting the search into two stages: First find the canonical subcurve that contains the point that we are looking for, by adding subcurves one by one. Then, look for within by binary search using the tree of the canonical subcurve . The search consists of steps, but in each step the set of canonical subcurves to which we need to apply either grows by one, or we replace the last subcurve with a shorter one. Hence, does not need to recompute the smallest disk for all pairs of subcurves, but only for the pairs involving the new curve. This effectively speeds up by a logarithmic factor, resulting in total query time of .
The following theorem summarizes our main result.
Theorem 7.
Let be a polygonal curve with vertices in the plane. One can preprocess in time into a data structure of size , so that given a line and a positive integer , an at-most--vertex curve on minimizing (and the corresponding distance) can be found in time.
4 Extension to geometric trees
A geometric tree is a tree whose vertices are points in the plane and whose edges are line segments connecting the corresponding points. In this section, we expand the data structure to support preprocessing a geometric tree so that, given two vertices of the tree, an arbitrary line , and an integer , one can find an at-most--vertex curve on that minimizes the Fréchet distance to the path from to in . We use ideas similar to those in [3], which in turn use heavy-path decomposition of [17].
More specifically, the heavy-path decomposition of is a collection of vertex-disjoint paths in which collectively cover all the vertices of . Any path in is a concatenation of subpaths of (some of) the paths and, this information can be extracted from the heavy-path decomposition data structure in time [17] (in fact, a faster implementation is possible, but would not help us, as it is not the bottleneck here). We then construct the data structure , for each curve , as above. Since the paths are disjoint, the total size of the data structures involved is , where is the number of vertices in .
Given a decomposition of as above, we simply run Algorithm 2 (which, in turn, invokes Algorithm 1), on a concatenation of subpaths of preprocessed heavy paths. Decision algorithm requires an implementation of for such a concatenation. Since we can partition the subpaths further, each into canonical subpaths, we are effectively running the algorithm for a subpath, as in Section 2.4.4, but with canonical subpaths, resulting in running time of for and for Decision.
As for Optimization, conceptually we run Algorithm 2 with minor modifications to speed it up. There are at most levels of recursion. We will describe and analyze one level and multiply the runtime by .
At a fixed level of recursion, we need to find the shortest prefix of the concatenation of subpaths that satisfies the Decision condition. We first find so that satisfies it, but does not (or , in which case we are done), adding one subpath at a time, sequentially. This takes at most invocations of Decision, for a total of time. Now we subdivide into a logarithmic number of canonical subcurves and apply the same logic to find the canonical subcurve that contains the (still unknown) point we want. This involves more invocations of Decision, for a total of time.
However, we did not account for the work of the algorithm. It is easily checked that during these first two steps, over all the calls to , it is enough to compute the minimum enclosing radius with center on for at most pairs of canonical subpaths constituting the subpaths , at a cost of per pair; so the total cost of all the runs at the current level of recursion up to now is at most .
Finally, we use the hierarchical decomposition of to find desired point , essentially as in the proof of Theorem 7. We observe that descending the hierarchy in binary search, we always either add the last canonical subpath or replace the last canonical subpath, so the number of pairs of subpaths that need to be recomputed by the algorithm on each descent step is only (rather than ) and therefore each call takes time, for a total of over the entire binary search.
To summarize, each level of recursion in this implementation of Optimization takes time. We conclude that that the full running time of Optimization is , which finishes the proof of the following theorem:
Theorem 8.
Let be a geometric tree on vertices. One can preprocess in time into a data structure of size , so that given vertices and in , a line , and a positive integer , the at-most--vertex curve on minimizing the discrete Fréchet distance between and the path between and in , can be found in time.
5 Extension to polygonal regions in the plane
Let be a (topologically closed) polygonal region in the plane bounded by edges and be an -vertex curve. Given an integer , we show how to find an at-most--vertex curve that minimizes the discrete Fréchet distance . That is, we find a curve , with and , that realizes the expression
Note that need not be unique.
Observe that we only need to implement one new primitive, which we refer to as , that computes the smallest-radius disk centered at a point of for a subcurve of (and its corresponding and ). Using one can perform binary search over to implement at the cost of a logarithmic factor in the running time. The decision and optimization functions (Algorithms 1 and 2) will use the new primitive and otherwise remain the same.
We now sketch how to implement for a subcurve . Let be the center of the mec containing the subcurve and centered at a point of .
First we find the center of the unconstrained mec of , using the data structure in [9] for the mec of a planar point set. We partition into canonical subcurves (as in Section 2.3) and preprocess each subcurve as in [9]. Using the algorithm from Section 3 of [9], we can compute the mec of the union of the subcurves, that is, of in deterministic time (refer to the discussion after Lemma 2 in that paper). Alternatively, Theorem 2 in [9] provides an algorithm for finding the mec in expected time .333The statements are phrased in terms of solving an LP over an intersection of polyhedra in , but a later discussion points out that the same logic, with minor modifications, applies to finding the mec of a discrete point set in the plane as well [9]. If the center of the resulting mec lies in , we are done: we found .
Otherwise, since is a convex function of , the center lies on the boundary of . For each bounding segment of , we compute for the supporting line of in time ; refer to Lemma 3. Once again, if the center lies in , this gives a candidate mec, otherwise the endpoint of closest to provides such a candidate. After repeating the process for each boundary edge of , we return (the center of) the smallest disk found. The total cost is thus .
To summarize, can be computed in deterministic time or in expected time , assuming .
We now implement by binary search over the prefix length using in expected time . Expected running time of decision will be and that of optimization will be (see the time analysis of Lemma 6 for details).
We did not optimize the logarithmic factors.
Theorem 9.
Let be a polygonal curve with vertices in the plane. One can preprocess in time into a data structure of size , so that given a polygonal domain bounded by segments, an at-most--vertex curve that minimizes the discrete Fréchet distance and the corresponding distance can be found in expected time . Alternatively, it can be found in deterministic time .
6 Discrete Fréchet distance simplification on a -flat
In this section we address the general version of the dimension-reduction problem mentioned in the introduction. That is, is a polygonal curve in , for some constant , and we need to preprocess into a compact data structure to efficiently support queries that specify a -flat () and an integer , and request a curve on of length at most that minimizes the discrete Fréchet distance to .
It is unlikely that there exists a solution to this problem with bounds similar to those that we obtained in Section 3, since it seems that, on the one hand, a primitive analogous to , where “disk” is replaced by “ball” and is replaced by , is essential, but on the other hand, it is probably impossible to implement such a primitive efficiently in higher dimensions, i.e., in polylogarithmic time after near-linear time preprocessing of .
Therefore, since we are interested in a solution to the general dimension-reduction problem with near-linear time preprocessing and polylogarithmic time query, we resort to approximate queries. Specifically, we present such a solution (for a prespecified parameter ) that given returns a curve on of length at most that minimizes the discrete Fréchet distance to up to a factor of (i.e., , where is such a curve that minimizes the discrete Fréchet distance to ).
Set ; then, and . We first define the primitive , which returns a -approximation of the smallest ball centered at a point of and containing a set of points. That is, let denote the radius of the returned ball, then , where is the radius of the smallest ball with center on .
We use coresets to implement , assuming is the set of points corresponding to a subsequence of . More precisely, we preprocess so that given a subsequence , for , one can compute in time, where is a constant, see below. Before presenting the details, we provide a brief overview of the coreset-related definitions and results that we use.
Coresets.
Let be a compact set in . Let be a direction. The directional width of in direction is the minimum distance between a pair of hyperplanes with normal containing between them. Fix a number , . A subset of is an -coreset for with respect to directional width, if it has the property that , for all directions ; since , . We say that a subset of is an -coreset for for farthest neighbors if, for any point , . It follows that if is any ball containing , then the ball , obtained by scaling around its center by a factor of , contains .
The following facts are known (refer for example to [14, Section 23.1]):
-
A compact set in has an -coreset for directional width of size .
-
Such an -coreset for a set of points can be constructed in time .
-
An -coreset for directional width is an -coreset for farthest neighbors.
-
If are -coresets for for width or farthest neighbors, then is an -coreset for for width or farthest neighbors.
As we are interested in coresets for farthest neighbors, with a slight abuse of terminology, we will use unqualified “coreset” to denote them hereafter.
Smallest ball with a center on a flat.
We recall the classical randomized LP-type algorithm of Welzl [20] for finding the smallest enclosing ball of an -point set in in expected time (with the implied constant depending on ). An inspection of the algorithm reveals that it can be used essentially unmodified to compute the smallest enclosing ball of whose center is constrained to lie in a given -flat; such a ball will be defined by at most points.
Implementation of .
We now explain how to implement , which is the basic building block of the approximation algorithm described above. As in Section 2.3 we once again construct a hierarchical decomposition of , storing of length , then and as its children and so forth, thereby creating a hierarchy of canonical subcurves. With each canonical subcurve we associate its coreset for farthest neighbors, of size at most . An arbitrary subcurve of is the union of canonical subcurves. If we collect the points of the corresponding coresets, we obtain a coreset for of size . Running the exact minimum-enclosing-ball with center on a -flat algorithm on the set , by the above discussion, gives an -approximation of corresponding ball for . Indeed, we obtain the smallest possible ball with the center constrained to the flat and enclosing . By the coreset property contains . There cannot be a substantially smaller ball with center on the flat and containing , as such a ball would also contain and would contradict minimality of .
The expected running time is dominated by that of the ball-finding routine, which is , concluding our description of an implementation of .
The decision algorithm.
We now wish to devise a decision algorithm, similar to the one in Section 3. Given , our decision algorithm will return true if there exists a sequence of at most points on such that . It will return false otherwise, in which case we may only conclude that for any sequence of at most points on , .
For this purpose, we define the primitive for a sequence of points and a radius . It returns a prefix of , such that returns a radius that is at most where as returns a radius that is greater than . In practice, returns the length of the prefix, so 0 if it is empty. Thus, if , , then and . (If , then , and if , then .)
Observation 10.
If returns the largest for which , then .
To implement we perform a binary search. That is, we set and call . Now, if , then we increase , and otherwise we decrease , etc. Thus, the cost of a call to with a subcurve of is time.
We now present the decision algorithm (see Algorithm 3).
The following lemma follows from Observation 10.
Lemma 11.
Let be a -dimensional flat in , an integer, and . If the call returns true, then there exists a sequence of points on , such that . If it returns false, then for any sequence of points on , . The running time is times the cost of a call to with a subcurve of .
Finally, we run the optimization algorithm in Section 3, using the decision algorithm above (and replacing by ). The following theorem summarizes the section’s main result.
Theorem 12.
Let be a polygonal curve with vertices in . Let , and set . One can preprocess in time into a data structure of size , so that given a -dimensional flat and a positive integer , an at-most--vertex curve on such that (and the corresponding distance) can be found in time, where is such a curve that minimizes the discrete Fréchet distance to .
7 Discussion and open problems
References
- [1] Pankaj K Agarwal. Range searching. In Handbook of Discrete and Computational Geometry, pages 1057–1092. Chapman and Hall/CRC, 2017.
- [2] Pankaj K. Agarwal, Sariel Har-Peled, Nabil H. Mustafa, and Yusu Wang. Near-linear time approximation algorithms for curve simplification. Algorithmica, 42(3-4):203–219, 2005. doi:10.1007/S00453-005-1165-Y.
- [3] Boris Aronov, Tsuri Farhana, Matthew J. Katz, and Indu Ramesh. Discrete Fréchet distance oracles. In Wolfgang Mulzer and Jeff M. Phillips, editors, 40th International Symposium on Computational Geometry (SoCG 2024), pages 10:1–10:14, 2024. doi:10.4230/LIPIcs.SoCG.2024.10.
- [4] S. Bereg, M. Jiang, W. Wang, B. Yang, and B. Zhu. Simplifying 3D polygonal chains under the discrete Fréchet distance. In LATIN 2008: Theoretical Informatics, 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008, Proceedings, volume 4957 of Lecture Notes in Computer Science, pages 630–641. Springer, 2008. doi:10.1007/978-3-540-78773-0_54.
- [5] Prosenjit Bose, Stefan Langerman, and Sasanka Roy. Smallest enclosing circle centered on a query line segment. In Proceedings of the 20th Annual Canadian Conference on Computational Geometry, Montréal, Canada, August 13-15, 2008, 2008.
- [6] Peter Brass, Christian Knauer, Chan-Su Shin, Michiel H. M. Smid, and Ivo Vigan. Range-aggregate queries for geometric extent problems. In Anthony Wirth, editor, Nineteenth Computing: The Australasian Theory Symposium, CATS 2013, Adelaide, Australia, February 2013, volume 141 of CRPIT, pages 3–10. Australian Computer Society, 2013. URL: http://crpit.scem.westernsydney.edu.au/abstracts/CRPITV141Brass.html.
- [7] Karl Bringmann and Bhaskar Ray Chaudhury. Polyline simplification has cubic complexity. J. Comput. Geom., 11(2):94–130, 2020. doi:10.20382/JOCG.V11I2A5.
- [8] A. Driemel and S. Har-Peled. Jaywalking your dog: Computing the Fréchet distance with shortcuts. SIAM J. Comput., 42(5):1830–1866, 2013. doi:10.1137/120865112.
- [9] David Eppstein. Dynamic three-dimensional linear programming. ORSA Journal on Computing, 4(4):360–368, 1992. doi:10.1287/IJOC.4.4.360.
- [10] Chenglin Fan, Omrit Filtser, Matthew J. Katz, Tim Wylie, and Binhai Zhu. On the chain pair simplification problem. In Frank Dehne, Jörg-Rüdiger Sack, and Ulrike Stege, editors, Algorithms and Data Structures - 14th International Symposium, WADS 2015, Victoria, BC, Canada, August 5-7, 2015. Proceedings, volume 9214 of Lecture Notes in Computer Science, pages 351–362. Springer, 2015. doi:10.1007/978-3-319-21840-3_29.
- [11] O. Filtser. Universal approximate simplification under the discrete Fréchet distance. Inf. Process. Lett., 132:22–27, 2018. doi:10.1016/j.ipl.2017.10.002.
- [12] Leonidas J. Guibas, John Hershberger, Joseph S. B. Mitchell, and Jack Snoeyink. Approximating polygons and subdivisions with minimum link paths. Int. J. Comput. Geom. Appl., 3(4):383–415, 1993. doi:10.1142/S0218195993000257.
- [13] Prosenjit Gupta, Ravi Janardan, Yokesh Kumar, and Michiel H. M. Smid. Data structures for range-aggregate extent queries. Comput. Geom., 47(2):329–347, 2014. doi:10.1016/J.COMGEO.2009.08.001.
- [14] Sariel Har-Peled. Geometric approximation algorithms. Number 173 in Mathematical Surveys and Monographs. American Mathematical Soc., 2011.
- [15] Arindam Karmakar, Sasanka Roy, and Sandip Das. Fast computation of smallest enclosing circle with center on a query line segment. Inf. Process. Lett., 108(6):343–346, 2008. doi:10.1016/J.IPL.2008.07.002.
- [16] Sasanka Roy, Arindam Karmakar, Sandip Das, and Subhas C. Nandy. Constrained minimum enclosing circle with center on a query line segment. Comput. Geom., 42(6-7):632–638, 2009. doi:10.1016/J.COMGEO.2009.01.002.
- [17] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. J. Comput. Syst. Sci., 26(3):362–391, 1983. doi:10.1016/0022-0000(83)90006-5.
- [18] Mees van de Kerkhof, Irina Kostitsyna, Maarten Löffler, Majid Mirzanezhad, and Carola Wenk. Global curve simplification. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, volume 144 of LIPIcs, pages 67:1–67:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.ESA.2019.67.
- [19] Marc J. van Kreveld, Maarten Löffler, and Lionov Wiratma. On optimal polyline simplification using the Hausdorff and Fréchet distance. J. Comput. Geom., 11(1):1–25, 2020. doi:10.20382/JOCG.V11I1A1.
- [20] Emo Welzl. Smallest enclosing disks (balls and ellipsoids). In Hermann A. Maurer, editor, New Results and New Trends in Computer Science, Graz, Austria, June 20-21, 1991, Proceedings [on occasion of H. Maurer’s 50th birthday], volume 555 of Lecture Notes in Computer Science, pages 359–370. Springer, 1991. doi:10.1007/BFB0038202.
