Property Testing of Curve Similarity
Abstract
We propose sublinear algorithms for probabilistic testing of the discrete and continuous Fréchet distance – a standard similarity measure for curves. We assume the algorithm is given access to the input curves via a query oracle: a query returns the set of vertices of the curve that lie within a radius of a specified vertex of the other curve. The goal is to use a small number of queries to determine with constant probability whether the two curves are similar (i.e., their discrete Fréchet distance is at most ) or they are “-far” (for ) from being similar, i.e., more than an -fraction of the two curves must be ignored for them to become similar.
We present two algorithms which are sublinear assuming that the curves are -approximate shortest paths in the ambient metric space, for some . The first algorithm uses queries and is given the value of in advance. The second algorithm does not have explicit knowledge of the value of and therefore needs to gain implicit knowledge of the straightness of the input curves through its queries. We show that the discrete Fréchet distance can still be tested using roughly queries ignoring logarithmic factors in . Our algorithms work in a matrix representation of the input and may be of independent interest to matrix testing.
Our algorithms use a mild uniform sampling condition that constrains the edge lengths of the curves, similar to a polynomially bounded aspect ratio. Applied to testing the continuous Fréchet distance of -straight curves, our algorithms can be used for -approximate testing using essentially the same bounds as stated above with an additional factor of .
Keywords and phrases:
Fréchet distance, Trajectory Analysis, Curve Similarity, Property Testing, Monotonicity TestingFunding:
Anne Driemel: Affiliated with Lamarr Institute for Machine Learning and Artificial Intelligence.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsAcknowledgements:
Anne Driemel would like to thank Ronitt Rubinfeld and Ilan Newman for helpful discussions in the early stages of this research. The authors thank Sariel Har-Peled for feedback on an earlier version of this manuscript.Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
We initiate the study of property testing for measures of curve similarity, motivated by the need for fast solutions for curve classification and clustering. Thus, our research lies at the intersection of property testing and computational geometry. While property testing is a very broad area, we believe this intersection has received far less attention than it deserves. We also believe that property testing for well-studied measures such as the Fréchet distance is especially well-motivated due to its connections to other problems studied on the curves, such as clustering [3, 8], similarity search [1, 7, 14] and map reconstruction [4, 5].
Typically in property testing, we are given access to a (large) data set and the goal is to very quickly assess whether the data has a certain property. The classical notation of correctness is typically not suitable for property testing since these algorithms are intended to be randomized algorithms that access (very small) parts of data via queries. Instead, a property testing algorithm is considered correct if it can satisfy the following two conditions, with a probability close to 1 (e.g., at least probability for some small value of ): first, if the input has the desired property, the algorithm must return accept and second, if the input is “far” from having the property (under some suitable definition of “far”), the algorithm should reject the input. Our algorithms have one-sided error, meaning, they will not produce false negatives and thus rejection means that the input does not have the desired property.
For further reading on property testing, see [24, 2]. Here, we quickly review the main motivation for property testing in general and then we focus on computational geometry.
Property testing algorithms can be useful in many different scenarios. In particular, if the input is extremely large, it makes sense to obtain a quick approximate answer before deciding to run more expensive algorithms. This is especially useful if the testing algorithm has one-sided error; in that case, if the test returns reject, there is no need to do any extra work. For the Fréchet distance, there are applications [1, 7, 14] where negative filters are used to minimize expensive Fréchet distance computations; testing may be useful in such cases. Property testing is also useful if most inputs are expected to not have the desired property, or the input distribution is such that each input either has the property or it is “far” from having that property. In addition, the definition of property testing has a strong connection to topics in learning theory such as probably approximately correct (PAC) learners and this was in fact one of the important motivations behind its conception [23].
Another motivation for property testing is when small errors can be tolerated or objects that are close to having the desired property are acceptable outcomes; in this view, property testing is in spirit similar to approximation algorithms where an approximate answer is considered to be an acceptable output. In the context of similarity testing we can also compare it to a partial similarity measure. In fact, our chosen error model will be very close to a partial Fréchet distance [6]. For more details on motivations to study property testing see [17, 24]. See also the aforementioned references for more detailed results on property testing as surveying the known results on property testing is beyond the scope of this paper.
Computational geometry has a long tradition of using randomization and sampling to speed up algorithmic approaches [22, 19, 20, 18]. Property testing has received some attention within computational geometry, but is much less explored compared to other areas. There are fast and efficient testers for many basic geometric properties, such as convexity of a point set [13], disjointness of objects [13], the weight of Euclidean minimum spanning tree [10, 11, 12], clustering [21], the Voronoi property of a triangulation [13], ray shooting [9, 13], as well as LP-type problems [15].
Unlike in graph property testing, where it is assumed that we can sample a vertex uniformly at random, and query for its neighbors, it seems that, for geometric data, there is no commonly agreed upon query model. Different types of queries are used to access the data in the above cited works, such as range queries, cone nearest neighbor queries, queries for the bounding box of the data inside a query hyper-rectangle. In our paper, we use queries to the free space matrix of the two input curves, see Section 3 for the precise definition.
We mention that there is also work on matrix testing [16]. Compared to our setting, the input size is defined as – the number of entries needed to specify the matrix. In our case, we assume the input size to be , the maximum of column and row dimension of the matrix.
2 Preliminaries
Let be a metric space. We say a (discrete) curve in is an ordered point sequence with for all . We call the points of the curve vertices. We denote by the number of vertices in and by its length, which is defined as . The subcurve of between and is denoted by . A curve is called -straight if for any two vertices and in , we have . Denote by the set of integers from to and by the integer lattice of times integers. Given two curves and , we say that an ordered sequence of elements in is a coupling of and , if it starts in , ends in and for any consecutive tuples in it holds that . We define the discrete Fréchet distance between and as follows
For brevity, we simply call this the Fréchet distance between and . The free space matrix of and with distance value is an matrix , where the -th column corresponds to the vertex of and the -th row corresponds to the vertex of .
The entry has the value if and otherwise.111Note we use and in switched roles compared to the conventions in the literature on the free space matrix. Our notation makes the cost-function of the path more intuitive. A coupling path is the path through the integer lattice defined by the coupling . We define the cost of such a path as . Note that the Fréchet distance between and is at most if and only if there exists a coupling path with cost from to .
Our analysis is based on a property of the free space matrix. We first define this property and then link the property to a certain class of well-behaved input curves.
Definition 1 ().
Let be a free space matrix of curves and . We say that is if, for any tuples and with , it holds that and .
See Figure 1 for a visualization. A consequence of this property is that zero-entries within one row or within one column cannot be too far away from each other.
Observation 2.
Suppose is . If we have , then we have . If we have , we have .
In the following, we argue why we think the -locality assumption is reasonable. Note that will not exceed on any input. There are classes of well-behaved curves that satisfy the locality assumption for some . Lemma 3 below shows that the free space matrix of approximate shortest paths is -local. For simplicity, we impose the assumption that the lengths of the edges are bounded by a fixed multiple of , the query radius of the Fréchet distance. In the full version, we show that our approach can be adapted to work as well if the lengths of the edges are bounded by a constant multiple of any fixed value.
Lemma 3.
Let and be -straight curves with edge lengths in for some constant . Let be their free space matrix with distance value . Then, is -local.
Proof.
Let be such that and . So we have and . We also have because of the edge length constraint, where is replaced by if . Using the edge length constraint and -straightness we derive . These observations together with the triangle inequality yield
The other inequality is shown by reversing the roles of and .
3 Problem definition and results
The problem we study in this paper is the following. Assume we want to determine for two curves and , each consisting of vertices,222For ease of notation, our analysis assumes the input curves have the same number of vertices. if their Fréchet distance is at most a given value of . We do not have direct access to the input curves. Instead, we have access to an oracle that returns the information in a given row or column of the -free space matrix in the form of a sorted list of indices of zero-entries. We call this a query and we want to determine using as few (sublinear in ) queries as possible. Note that from the point of view of a data structure setting, our query corresponds to a classical ball range query with a vertex of one curve and returns the list of vertex indices of the other curve that are contained in the ball of radius centered at .
Our bounds on the number of queries will be probabilistic and will hold under a certain error model that allows for the coupling path to pass through a bounded number of one-entries. These are entries where the corresponding points are far from each other. In the full version, we discuss alternative error models and how they relate to our chosen error model.
Definition 4 (-far).
Given two curves and consisting of vertices each, we say that and are -far from each other if there exists no coupling path from to in the -free space matrix of cost or less.
Definition 5 (Fréchet-tester).
Assume we are given query-access to two curves and , and we are given values and . If the two curves have Fréchet distance at most , we must return “yes”, and if they are -far from each other w.r.t. the Fréchet distance, the algorithm must return “no”, with probability at least .
Results.
We present two algorithms for testing whether a pair of input curves have discrete Fréchet distance at most a given real value . Assume that the algorithm has query access only to the -free space matrix of the input curves and that this matrix is -local (see Definition 1). The first algorithm requires knowledge of , and it uses queries (Theorem 18). Note that this bound is intrinsically independent of while can be at most , by the definition of locality. The second algorithm does not require knowledge of in advance (and thus it can be applied to any matrix) and requires many queries (Theorem 25). This algorithm gains its knowledge of with a variant of monotonicity testing (Theorem 23). Using Lemma 3 these results directly imply the same bounds for the discrete Fréchet distance of -straight curves with bounded edge lengths. In Section 6 we show how to apply our algorithms to testing the continuous Fréchet distance. As a result, we obtain essentially the same bounds with respect to -straight curves while removing the assumption on the edge lengths entirely (Theorem 29).
Techniques.
Our algorithms use the concept of permeability, which tests whether a specified subcurve has small partial Fréchet distance to any subcurve of the other curve. Concretely, the algorithm samples a block of consecutive columns (or rows) in the -free space matrix and checks if there is a coupling path of cost passing somewhere through this block (see Algorithm 1). If the algorithm happens to find a block that does not satisfy this condition, then we know that the global Fréchet distance is larger than the distance parameter and the algorithm can safely return ’no’. The difficulty lies in proving that after a certain number of random queries, the algorithm can return ’yes’ with sufficient certainty. To this end, we show that if the two curves are -far then the algorithm, which tests subcurves of varying sizes up to roughly , is likely to sample a permeability query that would return ’no’.
In the full version, we discuss a simple tester for the Hausdorff distance and a simple -approximate Fréchet tester. This algorithm merely tests for columns and rows that contain one-entries only and are therefore somewhat blocked for coupling paths in the free space matrix. The example in Figure 2 shows that such simple queries are not sufficient to test the Fréchet distance exactly.
4 Testing the discrete Fréchet distance
In this section, we describe and analyze a Fréchet tester under the assumption that the free space matrix is -local for a given value of .
4.1 The algorithm
The idea of Algorithm 1 is to sample a number of columns and rows and check whether there is locally a coupling path of cost zero possible. The following definition classifies when such a (local) path exists.
Definition 6 (Permeability).
We say a block of consecutive columns (resp., rows) from index to index is permeable if there exists a coupling path of cost zero that starts in column (resp., row) and ends in column (resp., row) .
If a column (resp. row) is individually not permeable, i.e. it contains only one-entries, we call it a barrier-column (resp. barrier-row). Note that any non-permeable block of rows or columns is a witness to the fact that no global coupling path of cost exists and that the algorithm can answer “no”.
The algorithm first tests if the first or last entry (i.e. or ) is a one-entry. If not, it queries randomly sampled columns and rows and checks if any of them is a barrier-column or barrier-row. In Lines 6-8 we sample random intervals of length for each from to . We then check all corresponding blocks of columns and rows for permeability and if all blocks are permeable, we return “yes”.
-
1.
If or then return “no”.
-
2.
repeat times:
-
3.
sample an index uniformly at random from .
-
4.
if row or column of is a barrier-column or barrier-row then return “no”.
-
5.
, , let be a set of intervals and set .
-
6.
for do:
-
7.
sample different indices uniformly at random from .
-
8.
for each do: add to .
-
9.
foreach do
-
10.
if block of consecutive columns is not permeable then return “no”.
-
11.
if block of consecutive rows is not permeable then return “no”.
-
12.
return “yes”.
To check if a block of columns (or rows) is permeable, the algorithm first performs queries to obtain the positions of zero-entries in these columns (or rows), then we build the induced subgraph of the grid for these zero-entries, connect all neighboring zero-entries according to the possible steps of a coupling and then connect all zero-entries of the last column to a sink and all zero-entries of the first column from a source. It remains to check if there is a path from source to sink, which can be done in linear time since the graph is acyclic. The running time is linear in the total number of zero-entries queried. We call this a permeability query.
4.2 Basic properties of permeability
For two curves and with vertices each, and a -local free space matrix, Algorithm 1 returns “no” only if it has determined that no coupling path of cost zero from to can exist (by finding blocks of rows or columns that are not permeable). We need to show that it returns “yes” correctly with sufficiently high probability.
For technical reasons, our analysis uses a specific kind of coupling paths that are more restricted than the ones given in Section 2. We introduce them here and argue that it suffices to consider these coupling paths only.
Definition 7 (diagonal-restricted path).
A diagonal-restricted path through the free space matrix is a path corresponding to a coupling in which for any consecutive tuples of the form it holds that .
In other words, a diagonal-restricted path , is allowed to take diagonal steps in the free space matrix only if both entries are zero. Hence, any one-entry visited by must be reached by either a vertical or horizontal step in and it must be left by a vertical or horizontal step in . This definition, which might seem overly technical at first sight, will simplify a lot of our arguments. The following observations show that it suffices to consider diagonal-restricted paths instead of all coupling paths in the analysis of a Fréchet tester.
Observation 8.
For two curves and with vertices each, the following holds:
-
i)
There exists a diagonal-restricted path of cost zero from to if and only if there exists a coupling path of cost zero from to .
-
ii)
If there is no diagonal-restricted path of cost at most through from to , the curves are -far. See Figure 3.
-
iii)
If and are -far, there is no diagonal-restricted path of cost at most through from to .
Below, we present three structural lemmas that help with the analysis of Algorithm 1. For the proofs of these lemmas, we refer to the full version. The first lemma considers a subpath of a diagonal-restricted path that starts and ends in a zero-entry and visits only one-entries in between. We show that the bounding box of the two zero-entries must otherwise be filled with one-entries. Note that this lemma does not hold for general coupling paths of Section 2. Using it will simplify our analysis and is the main motivation for introducing Definition 7.
Lemma 9 (Box of ones).
Let be a diagonal-restricted path of minimum cost. Let be zero-entries such that visits only one-entries between them. Then, for all tuples we have that .
We give a definition describing when two columns or rows satisfy the constraints for -locality.
Definition 10.
The columns and pass -locality if any two zero-entries and in columns and satisfy . Similarly the rows and pass -locality if any two zero-entries and in rows and satisfy .
A column (resp. row) can also pass -locality with itself if (resp. ). Intuitively, two columns pass -locality, when all their zero-entries are not too far away vertically. Note that all columns (resp. rows) pass -locality with each other if is -local.
The second lemma shows that if an optimal diagonal-restricted path contains a long sequence of one-entries, then there must be many barrier-columns and barrier-rows.
Lemma 11 (Barrier Lemma).
Let be a diagonal-restricted path of minimum cost. Suppose that there are two zero-entries such that visits no zero-entry and at least one-entries in between them and suppose that all columns in and all rows in pass -locality with each other. Then there is a total of at least barrier-rows between and and barrier-columns between and .
The third lemma shows that if an optimal diagonal-restricted path has a long stretch with relatively many one-entries on the path, there cannot be any long coupling path of cost zero in the same sequence of rows or columns of the matrix , implying impermeability.
Lemma 12 (Impermeability Lemma).
Let be a diagonal-restricted path through of minimum cost. Suppose with (resp. ) correspond to zero-entries in and the subpath of from to visits at least one-entries and suppose that any column in and any row in passes -locality with itself. Then, the block of columns (resp. rows ) is not permeable.
4.3 Analysis of the algorithm
Algorithm 1 samples a range of intervals in Lines 6-8 which are then tested with a permeability query. For this subroutine we can prove the following lemma. For a proof see the full version.
Lemma 13.
Let be a set of all non-empty intervals from and let be a set of unknown intervals such that the length of each interval is at most and each element of is contained in at most of these intervals for some constant . Lines 6-8 of Algorithm 1 select a subset of intervals with such that with probability at least , at least one of the intervals in is contained in at least one interval in .
The next lemma shows that Algorithm 1 is very likely to return “no” in Line 4 if there are many barrier-columns and barrier-rows. The proof can be found in the full version.
Lemma 14.
Lemma 15 (Main Lemma).
Let be -local and . Suppose the total number of barrier-columns and barrier-rows is at most . Let be a diagonal-restricted path with lowest cost through and suppose that . Then, with probability at least at least one sampled interval in the set during the execution of Algorithm 1 corresponds to a non-permeable block of columns or rows.
Proof.
We first divide the path into a minimum number of groups that each contain more than one-entries and start and end with a zero-entry. A visualization of this can be seen in Figure 5. Here, the start of one group is the same entry as the end of the prior group. We do this greedily by following until we have visited one-entries. Then, we end the group at the next zero-entry after that and repeat the process. Note that the last group may contain fewer than one-entries.
We will first show that there are not too many groups that contain more than one-entries. Then, we will show that within the remaining groups, there are sufficiently many groups that are not too big. With these groups we then form the set for the interval sampling in Lemma 13 (i.e., each interval in will induce a permeability query that fails).
A group that contains more than one-entries is called hollow, otherwise, it is dense.
Claim 16.
At most of the one-entries of are in hollow groups.
For a proof of the claim, we refer to the full version. So we know that at least of the one-entries of are in dense groups. Since a dense group contains at most one-entries, we have at least dense groups. Let be a dense group that starts in and ends in . We say that is the size of . This corresponds to the sum of the number of columns and rows that it visits. In total, every row and column is counted at most twice by the definition of the groups. So the sum of all group sizes is at most . We observe that at least half of the groups (which are at least many) have size at most . We call these groups small. At last, we create a set of intervals so that we can apply Lemma 13 to it. Each of the intervals in will have the property that either the block of consecutive columns or the block of consecutive rows is not permeable.
Let be a small dense group and suppose it starts in and ends in . We observe that unless is the very last group, it visits at least one-entries and therefore either visits more than rows or more than columns. In the first case, we can apply the Impermeability Lemma to see that the block of columns is not permeable. So we add to . In the second case, we can apply the Impermeability Lemma to see that the block of rows is not permeable. So we add to . In the end the set has at least elements and each of them has length at most . If we take a closer look at the definitions of the groups, we see that all columns are only contained in one group except for the first and last column of each group. So we can say that every column is contained in at most two groups, i.e. at most two intervals of . The same holds for rows. So, we have that each element of is contained in at most four of the intervals in . So we can apply Lemma 13 to with , , to see that the algorithm samples one of the intervals from with probability at least . So the permeability query fails for the columns or for the rows with the respective interval, which yields the desired statement.
Lemma 17.
Algorithm 1 performs queries.
Proof.
Line 4 performs queries. By Lemma 13, Lines 6-8 sample a set of intervals that contain a total of rows and columns. Each of these columns and rows cause a query in a permeability query. As and , the claim follows.
Theorem 18.
Let and be given. Let and be curves with vertices such that their free space matrix with value is -local and is known. Then, Algorithm 1 is a Fréchet-tester that uses queries.
Proof.
If the Fréchet distance of and is at most , there is a coupling path of cost through and Algorithm 1 always returns “yes”. So we only need to determine the probability of a false positive. Suppose and are -far and therefore, a minimum cost diagonal-restricted path visits more than one-entries. If there are at least barrier-columns and barrier-rows, we return “no” with probability at least in Line 4 by Lemma 14. If there are fewer than barrier-columns and barrier-rows and we get past Line 4, we return “no” with probability at least in Lines 10 and 11 by the Lemma 15. So in any case, the probability for a false positive is at most . The number of queries follows from Lemma 17.
5 Locality-oblivious testing of the discrete Fréchet distance
Our main goal in this section is to obtain a Fréchet-tester that does not require an additional input (the value of ). Our approach for this is to first run a “tester” to find an estimate for and then run Algorithm 1 with that parameter. Unfortunately, this combination is more complicated than just running two testers, for a number of different reasons: First, Algorithm 1 assumes that the matrix is -local whereas the best we can do is to show that with some probability close to 1, it is “close” to being -local. This change breaks the proof of correctness of Algorithm 1. The second complication is that we cannot plug in any tester for locality, we need a specific one for our analysis that also guarantees some additional properties. Guaranteeing these properties requires more queries than just testing for locality.
5.1 Testing the locality of the Free Space Matrix
To this end, we present a specific error model that is suitable for our purposes. In the following we define how -locality of a matrix can be violated. First, it could be that two columns or rows do not pass -locality according to Definition 10. In this case, we say that the respective columns or rows fail -locality. We introduce a second definition that considers a single column and looks for violations of -locality within that column and within the rows that have a zero-entry in this column.
Definition 19.
A single column (resp. row ) fails second order -locality if either it contains two zero-entries (resp. ) with (resp. ) or if there exists a zero-entry and two zero-entries and (resp. and ) such that (resp. ). Otherwise, it passes second order -locality.
Intuitively, a single column fails second order -locality if it has a zero-entry in a row that also has two zero-entries more than distance apart (similar intuition holds for the rows). See Figure 6. Observe that if a matrix is not -local then there either exists a pair of columns or rows that fail -locality or a column or row that fails second order -locality.
We now want to define what it means for a matrix to be -close to -local. This definition is specifically designed to fit the Fréchet-tester in Section 5. Other, maybe more natural definitions of -close may need different property-testers. However, testing -locality is not our main focus in this paper.
Definition 20 (strongly -local).
We say that a free space matrix is strongly -local for some and if we can partition the set of all columns and rows into two disjoint sets and with such that the following properties hold: i) columns and rows are in , ii) any two columns and any two rows in pass -locality and iii) any row or column in passes second order -locality. We call the set the witness set and the columns and rows in ignored.
If there are rows and columns such that no two of these rows or columns fail -locality and none of these columns or rows fail second order -locality, then they serve as a witness set to see that the matrix is -local. Because of this, we will check for failures of -locality and second order -locality among the columns and rows.
Observation 21.
When we query two columns and , we can check if they fail -locality by comparing the lowest zero-entry in column with the highest zero-entry in column and vice versa. Also, after querying a column , testing for second order -locality can be done with at most additional row queries at all zero-entries of the column .
Our testing algorithm is quite simple and, in fact, it is an extension of the classic “monotonicity” testing in an array [2]. The algorithm is given in Algorithm 2.
-
1.
binary search tree of height on with root .
-
2.
repeat times:
-
3.
First two iterations: and . Later: Choose unif. at random.
-
4.
If column or row fails second order -locality then return “no”.
-
5.
foreach on the --path in do:
-
6.
If columns and fail -locality then return “no”.
-
7.
If rows and fail -locality then return “no”.
-
8.
return “yes”.
Lemma 22.
Given and , Algorithm 2 returns “yes” if is -local; if is not strongly -local, it returns “no” with probability at least . It uses queries.
For a detailed proof of Lemma 22 we refer to the full version. The proof relies on standard techniques for monotonicity testing [2]. The Locality-tester is given a value as an argument. Next, we show that it is also possible to obtain a good estimate of the locality, as well. The straightforward approach is to start with a small estimate for , e.g., , use Algorithm 2 and every time it fails, we can double the and keep repeating. Our algorithm is very similar to this approach. The difference is merely that we use different values throughout the iterations and we call the locality tester multiple times for the same .
Theorem 23.
Suppose is -local and we are given . We can design an algorithm that returns a value such that the matrix is strongly -local with probability at least . The number of queries used is .
Proof.
The algorithm operates in rounds. In the -th round (starting from ) we run Locality-tester( for times. If the locality-tester returns “no” in any of these runs, we go to round , otherwise, we return .
First, we examine the number of queries. Let . Then, the algorithm terminates before or in round since no two columns or rows fail -locality for any and all rows and columns pass second order -locality. So we have at most rounds. In round , we perform queries by Lemma 22. The total number of all queries from round to the last round (which is at most ) is .
By Lemma 22, “no” answers are always correct. However, with probability at most , the algorithm can return “yes” even though is not -local. As all the executions of the algorithm are independent, the probability that we get “yes” in all calls of the locality-tester in round is at most . Using a union bound, the probability that we return a value while is not -local is at most .
5.2 The combined algorithm
We now present our algorithm for testing the Fréchet distance without knowing the locality parameter in advance. Algorithm 3, uses the results from Section 5.1 to first estimate the locality of the matrix and then uses the Fréchet-tester from Section 4 to decide if the Fréchet distance of the two curves is smaller or equal to .
-
1.
Output of the Algorithm in Theorem 23 for and .
-
2.
return Fréchet-tester1().
The following lemma is analogous to the main lemma (Lemma 15) from Section 4. For a proof of this lemma we refer to the full version.
Lemma 24.
Let and let be strongly -local and . Suppose the total number of barrier-columns and barrier-rows is at most . Let be a diagonal-restricted path with lowest cost through and suppose that . Then, with probability at least at least one sampled interval in the set in the call of Algorithm 1 during Algorithm 3 corresponds to a non-permeable block of columns or rows with probability at least .
Theorem 25.
Let and be given. Let and be curves with vertices such that their free space matrix is -local. Then, Algorithm 3 is a Fréchet-tester that uses queries.
Proof.
The Locality-tester uses queries. Since , we need queries to test for barrier-columns and barrier-rows and queries for the permeability queries. The total number of queries follows since .
If the Fréchet distance of and is at most , Algorithm 3 will always return “yes”. So we have to bound the probability of false positives. Assume that and are -far and therefore, a minimum cost diagonal-restricted path through has cost higher than . Suppose that is strongly -local for the output of the algorithm from Theorem 23. By the same theorem, this happens with probability at least . If the matrix contains more than barrier-columns and barrier-rows, the subroutine that calls Algorithm 1 returns “no” with probability at least by Lemma 14. If the matrix contains at most barrier-columns and barrier-rows and the algorithm proceeds until the interval sampling, we are in the setting of Lemma 24 and we return “no” with probability at least . Therefore, the probability that the algorithm returns “no” is at least . So the probability of a false positive is at most .
6 Testing the continuous Fréchet distance
This section describes how the Fréchet-testers for the discrete Fréchet distance can be used to serve as -approximate Fréchet-testers for the continuous Fréchet distance. First, we define -far for the continuous Fréchet-distance. Then, we explain how to use our Fréchet-testers on instances of the continuous Fréchet distance with this error model.
We first define the continuous Fréchet distance. We set to be the set of all continuous non-decreasing functions with and . Let and be polygonal curves with and vertices. The continuous Fréchet distance is defined to be
Now we want to introduce a possible error model for the continuous Fréchet distance. Given a value and , , we define
as the partial difference, namely the total length of the portions of the two curves and that are not matched with distance at most by and . Building on this, we define the partial Fréchet difference to be
This is essentially the same as the partial Fréchet distance as defined in [6].333Note that, in line with their definition, we assume the reparameterizations to be (piecewise) differentiable.
Definition 26 (-far).
Given two polygonal curves and , we say that and are -far from one another w.r.t. the continuous Fréchet distance if .
Definition 27 (-approximate Fréchet-tester).
Assume we are given query-access to two curves and , and we are given values and . If the curves have continuous Fréchet distance at most , we must return “yes”, if they are -far from each other w.r.t. the continuous Fréchet distance, the algorithm must return “no”, with probability at least .
In order to use the first Fréchet-tester introduced in this paper, we transform the polygonal curves into discrete curves by subsampling vertices along the edges of the curves. Denote by the discrete curve that arises from if we subsample vertices along the edges of such that visits a curve segment of length in between any two vertices for some . The last edge might be longer than but shorter than . Hence, has edges. This yields that . If is -straight, so is for any . The next lemma shows that the notions of -far are also related.
Lemma 28.
Let . If and are -far from each other, then and are -far from each other.
For a proof see the full version. We adjust our query oracle such that it can access the free space matrix of and . So we can apply the Fréchet-testers to and for an adequate choice of . We show that for given and , we can choose such that this yields a -approximate Fréchet-tester for the continuous Fréchet distance of and . Let . For , we test and for the distance . We observe that is -local if and are -straight. If , we have and if and are -far, we have that and are -far for for . So, if we know , we can apply the first Fréchet-tester for and . If is unknown, we apply the second Fréchet-tester for the same values.
Theorem 29.
Let be given and assume . Let and be -straight curves with . If is known, there exists a -approximate Fréchet-tester w.r.t. the continuous Fréchet distance that performs at most queries. If is unknown, there exists a -approximate Fréchet-tester w.r.t. the continuous Fréchet distance that performs at most queries.
A natural consequence of the above theorem is that for -straight curves, the number of queries is bounded in terms of and the aspect ratio of the input.
References
- [1] Julian Baldus and Karl Bringmann. A fast implementation of near neighbors queries for Fréchet distance (GIS cup). In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 99:1–99:4. ACM, 2017. doi:10.1145/3139958.3140062.
- [2] Arnab Bhattacharyya and Yuichi Yoshida. Property Testing: Problems and Techniques. Springer Nature, 2022.
- [3] Milutin Brankovic, Kevin Buchin, Koen Klaren, André Nusser, Aleksandr Popov, and Sampson Wong. (k, )-medians clustering of trajectories using continuous dynamic time warping. In SIGSPATIAL ’20: 28th International Conference on Advances in Geographic Information Systems, pages 99–110. ACM, 2020. doi:10.1145/3397536.3422245.
- [4] Kevin Buchin, Maike Buchin, David Duran, Brittany Terese Fasy, Roel Jacobs, Vera Sacristán, Rodrigo I. Silveira, Frank Staals, and Carola Wenk. Clustering trajectories for map construction. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 14:1–14:10. ACM, 2017. doi:10.1145/3139958.3139964.
- [5] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Jorren Hendriks, Erfan Hosseini Sereshgi, Vera Sacristán, Rodrigo I. Silveira, Jorrick Sleijster, Frank Staals, and Carola Wenk. Improved map construction using subtrajectory clustering. In LocalRec’20: Proceedings of the 4th ACM SIGSPATIAL Workshop on Location-Based Recommendations, Geosocial Networks, and Geoadvertising, pages 5:1–5:4. ACM, 2020. doi:10.1145/3423334.3431451.
- [6] Kevin Buchin, Maike Buchin, and Yusu Wang. Exact algorithms for partial curve matching via the Fréchet distance. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 645–654. SIAM, 2009. doi:10.1137/1.9781611973068.71.
- [7] Kevin Buchin, Yago Diez, Tom van Diggelen, and Wouter Meulemans. Efficient trajectory queries under the Fréchet distance (GIS cup). In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 101:1–101:4. ACM, 2017. doi:10.1145/3139958.3140064.
- [8] Kevin Buchin, Anne Driemel, Natasja van de L’Isle, and André Nusser. klcluster: Center-based clustering of trajectories. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 496–499. ACM, 2019. doi:10.1145/3347146.3359111.
- [9] Bernard Chazelle, Ding Liu, and Avner Magen. Sublinear geometric algorithms. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 531–540, 2003. doi:10.1145/780542.780620.
- [10] Bernard Chazelle, Ronitt Rubinfeld, and Luca Trevisan. Approximating the minimum spanning tree weight in sublinear time. SIAM Journal on Computing, 34(6):1370–1379, 2005. doi:10.1137/S0097539702403244.
- [11] Artur Czumaj, Funda Ergün, Lance Fortnow, Avner Magen, Ilan Newman, Ronitt Rubinfeld, and Christian Sohler. Approximating the weight of the Euclidean minimum spanning tree in sublinear time. SIAM Journal on Computing, 35(1):91–109, 2005. doi:10.1137/S0097539703435297.
- [12] Artur Czumaj and Christian Sohler. Estimating the weight of metric minimum spanning trees in sublinear time. SIAM Journal on Computing, 39(3):904–922, 2009. doi:10.1137/060672121.
- [13] Artur Czumaj, Christian Sohler, and Martin Ziegler. Property testing in computational geometry. In European Symposium on Algorithms, pages 155–166. Springer, 2000. doi:10.1007/3-540-45253-2_15.
- [14] Fabian Dütsch and Jan Vahrenhold. A filter-and-refinement-algorithm for range queries based on the Fréchet distance (GIS cup). In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 100:1–100:4. ACM, 2017. doi:10.1145/3139958.3140063.
- [15] Rogers Epstein and Sandeep Silwal. Property testing of lp-type problems. In 47th International Colloquium on Automata, Languages, and Programming, volume 168 of LIPIcs, pages 98:1–98:18, 2020. doi:10.4230/LIPICS.ICALP.2020.98.
- [16] Eldar Fischer and Ilan Newman. Testing of matrix properties. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 286–295, 2001. doi:10.1145/380752.380812.
- [17] Oded Goldreich. Introduction to property testing. Cambridge University Press, 2017. doi:10.1017/9781108135252.
- [18] J.E. Goodman, J. O’Rourke, and C.D. Tóth. Handbook of discrete and computational geometry, third edition. CRC Press LLC, January 2017. doi:10.1201/9781315119601.
- [19] Sariel Har-Peled. Geometric Approximation Algorithms. American Mathematical Society, USA, 2011.
- [20] Jirí Matousek. Lectures on discrete geometry, volume 212 of Graduate texts in mathematics. Springer, 2002.
- [21] Morteza Monemizadeh. Facility Location in the Sublinear Geometric Model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023), volume 275 of LIPIcs, pages 6:1–6:24, 2023. doi:10.4230/LIPIcs.APPROX/RANDOM.2023.6.
- [22] Ketan. Mulmuley. Computational geometry : an introduction through randomized algorithms. Prentice-Hall, Englewood Cliffs, N.J, 1994.
- [23] Dana Ron. Property testing. Combinatorial Optimization – Dordrecht, 9(2):597–643, 2001.
- [24] Dana Ron. Algorithmic and analysis techniques in property testing. Foundations and Trends® in Theoretical Computer Science, 5(2):73–205, 2010. doi:10.1561/0400000029.
