Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better
Abstract
Many application areas collect unstructured trajectory data. In subtrajectory clustering, one is interested to find patterns in this data using a hybrid combination of segmentation and clustering. We analyze two variants of this problem based on the well-known SetCover and CoverageMaximization problems. In both variants the set system is induced by metric balls under the Fréchet distance centered at polygonal curves. Our algorithms focus on improving the running time of the update step of the generic greedy algorithm by means of a careful combination of sweeps through a candidate space. In the first variant, we are given a polygonal curve of complexity , distance threshold and complexity bound and the goal is to identify a minimum-size set of center curves , where each center curve is of complexity at most and every point on is covered. A point on is covered if it is part of a subtrajectory of such that there is a center whose Fréchet distance to is at most . We present an approximation algorithm for this problem with a running time of , where is the size of an optimal solution. The algorithm gives a bicriterial approximation guarantee that relaxes the Fréchet distance threshold by a constant factor and the size of the solution by a factor of . The second problem variant asks for the maximum fraction of the input curve that can be covered using center curves, where is a parameter to the algorithm. For the second problem variant, our techniques lead to an algorithm with a running time of and similar approximation guarantees. Note that in both algorithms and hence the running time is cubic, or better if .
Keywords and phrases:
Clustering, Set cover, Fréchet distance, Approximation algorithmsFunding:
Jacobus Conradi: Partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 313421352 (FOR 2535 Anticipating Human Behavior) and the iBehave Network: Sponsored by the Ministry of Culture and Science of the State of North Rhine-Westphalia. Affiliated with Lamarr Institute for Machine Learning and Artificial Intelligence.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Trajectory analysis is a broad field ranging from gait analysis of full-body motion trajectories [16, 20, 13], via traffic analysis [21] and epidemiological hotspot detection [25] to Lagrangian analysis of particle simulations [24]. Using the current availability of cheap tracking technology, vast amounts of unstructured spatio-temporal data are collected. Clustering is a fundamental problem in this area and can be formulated under various similarity metrics, among them the Fréchet distance. However, the unstructured nature of trajectory data poses the additional challenge that the data first needs to be segmented before the trajectory segments, which we call subtrajectories, can be clustered in a metric space. Subtrajectory clustering aims to find clusters of subtrajectories that represent complex patterns that reoccur along the trajectories [1, 8, 17], such as commuting patterns in traffic data.
We study two variants of a problem posed by Akitaya, Brüning, Chambers, and Driemel [3], which are based on the well-known set cover and coverage maximization problems. Both rely on the definition of a geometric set system built using metric balls under the Fréchet distance. Given a polygonal curve, we are asked to produce a set of “center” curves that either
-
1.
has minimum cardinality and covers the entirety of the input curve, or
-
2.
covers as much as possible of the input curve under cardinality constraints.
In either setting a point of the input-curve is considered as “covered”, if there is a subcurve of the input curve that contains and has small Fréchet distance to some center curve. Note that a point may be covered by multiple center curves. For the precise formulation refer to Section 2. This formulation extends immediately to a set of input curves – instead of one – where one is now tasked to cover the entirety (respectively as much as possible) of all points on all input curves combined.
In this paper we study the continuous variant of the problem where we need to cover the continuous set of points given by the input polygonal curve. It is instructive, however, to first consider a discrete variant of the set cover instance as follows. Given a discrete point sequence , consider the set of contiguous subsequences of the form . We say is contained in the set defined by if there exist such that is similar to . Note that a full specification of this set system has size in the worst case. This phenomenon is shared with the continuous setting where all known discretizations lead to a cubic-size set system. In this paper, we examine the question of whether we can obtain subcubic-time algorithms by internally representing the set system implicitly.
2 Preliminaries
An edge is a map defined by two points as the linear interpolation . A polygonal curve is a map defined by an ordered set as the result of concatenating the edges , where is the edge from to . Any point is said to lie on the edge if lies in the preimage of . Any point lies on at most two edges, exactly if it defines the endpoint of edge which coincides with the start point of . The number of points defining the polygonal curve is its complexity, and we denote it by . The set of all polygonal curves in of complexity at most we denote by . For any polygonal curve and values we define to be the subcurve of from to , which itself is a polygonal curve of complexity at most . For two polygonal curves and we define the continuous Fréchet distance to be where and range over all non-decreasing surjective functions from to .
Definition 1 (-coverage [3]).
Let be a polygonal curve in , and let and be given. For any , define the -coverage of on (refer to Figure 1) as
For a curve we denote the -coverage by .
Problem 1: Subtrajectory Covering (SC).
Let be a polygonal curve in , and let and be given. The task is to identify a set of curves in minimizing such that .
Definition 2 (Bicriteria approximation for SC).
Let be a polygonal curve in , and let and be given. Let be a set of minimum cardinality such that . Let be given. An algorithm that outputs a set of size such that is called an -approximation for SC.
Problem 2: Subtrajectory Coverage Maximization (SCM).
Let be a polygonal curve in , and let , and be given. The task is to identify a set of curves in maximizing the Lebesgue measure .
Definition 3 (Bicriteria approximation for SCM).
Let be a polygonal curve in , and let and be given. Let be a set of size such that is maximum. Let and be given. An algorithm that outputs a set of size such that is called an -approximation for SCM.
2.1 Prior work on Subtrajectory Covering and Coverage Maximization
We first discuss prior work on the SC problem. The SC problem was introduced by Akitaya, Brüning, Chambers, and Driemel [3]. Their results were later improved by Brüning, Conradi, and Driemel [4]. Both works identified a curve that approximates the input , such that any solution to the problem induces an approximate solution of similar size consisting of only subcurves of . This set of subcurves defines a set system which turns out to have low VC-dimension. This enables randomized set cover algorithms with improved approximation factors and expected running time in , where corresponds to the arc length of the curve . While the complexity of the main algorithm in [3] depends on the relative arclength of the input curve, the latter [4] also identified a set of points on that define subcurves of complexity which induce the set-system of the SetCover instance resulting in an expected running time of , where hides polylogarithmic factors. The approximation quality of both approaches depends on , as well as on the VC-dimension of the set system. A drawback of these algorithms is that they generate cluster centers that are line segments only and that the algorithm is randomized with large constants in the approximation factors. Conradi and Driemel [9] took a different approach by focusing on the greedy set cover algorithm, which successively adds the candidate curve maximizing the added coverage to a partial solution until the entirety of is covered. Their algorithm computes candidate curves that have complexity up to instead of . Applying the greedy algorithm, this results in a deterministic -approximation algorithm with running time . Recently, van der Hoog, van der Horst and Ophelders showed [23] that the cardinality of the candidate curve set can be further reduced to a size of . They improve the quality of the approximating curve resulting in a -approximation with running time in .
As for the SCM problem, by standard sub-modular function maximization arguments [19, 15], the greedy algorithm used in all these approaches [3, 4, 9, 23] yields an -approximation. Using this reduction, the candidate curve set identified in [23] yields a -approximation to the SCM problem with a running time of .
2.2 Structure and contributions
In Section 3 we describe how to obtain a set of candidate center curves based on known transformations [23]. Unlike the recent line of work discussed above [3, 4, 9, 23], we do not improve any further on the size of the set . Instead, our focus lies on the update step of the greedy set cover algorithm. In this step, we have to find the element that maximizes the coverage that is added to a partial solution. In Section 4 we describe our first contribution: a partition of the set into few ordered lists, called sweep-sequences, which we each process via a sweep algorithm computing their coverage reusing prior computations. Unfortunately, the coverage does not immediately allow efficient maintenance along a sweep-sequence. Here, we introduce our second contribution: a candidate-specific approximation of the coverage, called the proxy coverage. In Section 5, we describe how sweep-sequences allow efficient maintenance of the proxy coverage and with Theorem 26 describe the culmination of our techniques enabling our algorithm: for a weighted set of points we are able to compute the total weight of points that lie inside the proxy coverage of every element in the sweep-sequence in logarithmic time per element.
In Section 6 we present two -approximation algorithms for Subtrajectory Covering. The algorithms discretize the ground-set by the arrangement (of size ) of the intervals representing the coverage of each element in . Computing the arrangement together with standard greedy set cover techniques and the subroutine from Theorem 26 results in a first approximation algorithm for SC with running time in (see Corollary 30). Our third contribution is a sampling technique improving the running time: we identify a representative subset (comparable to a sampling) of size roughly of the arrangement. This enables a faster algorithm by first covering the representative subset and with it almost the entire arrangement in an initial pass. The remaining elements of the arrangement are then identified and covered resulting in the following theorem.
Theorem 4.
There is a -approximation for SC. Given a polygonal curve of complexity , and , its running time is in , where is the size of the smallest subset such that .
As trivially , Theorem 4 yields an algorithm with (near-)cubic running time. Further, in the case that , it yields an algorithm with subcubic running time. This compares favorably to the best known algorithm with running time [23].
In Section 7, we sketch how our techniques can be applied to obtain faster algorithms for the SCM problem which compares favorably to the best known algorithm with running time [23]. All omitted proofs and discussions can be found in the full version.
Theorem 5.
Let . There is a -approximation algorithm for SCM, where is the base of the natural logarithm. Given a polygonal curve of complexity , , and , its running time is in .
2.3 Related work
Prior to the line of work on the SC and SCM problem discussed in Section 2.1, Buchin et al. [7] presented one of the earlier works on subtrajectory clustering for both the discrete and continuous Fréchet distance. Their work focuses on finding the largest subtrajectory cluster, where different variants of “largest” are considered. They present hardness results for -approximations for any and a matching polynomial time -approximation algorithm. Gudmundsson and Wong [12] present a cubic lower bound conditioned on SETH for the same problem and show that this lower bound is tight. In subsequent experimental work [11, 5, 6, 8] the formulation from [7] was studied, but these works do not offer any theoretical guarantees.
Agarwal et al. [1] base their formulation of the subtrajectory clustering problem on FacilityLocation. In this formulation there is an opening cost associated with every center curve, a cost for every point on the input that is assigned to a cluster, and a cost for points that are not assigned. They present conditional NP-hardness results and an -approximation for well-behaved classes of curves under the discrete Fréchet distance.
3 Set system discretization
In this section we show that, given a curve of complexity , there is a curve of complexity and a set of at most subcurves of , each of complexity at most , such that for any set of curves there is a subset of size such that . This result is known and follows the line of research in [4, 9, 23]. All of these approaches compute a “maximally simplified” version of , see also [10]. Our definition of a simplification follows the notion of pathlet-preserving simplifications from [23], as it yields the best constants.
Definition 6.
Let a polygonal curve , , and be given. A curve is a simplification of if , and for any curve and with there is a subcurve with and .
Theorem 7 ([23]).
For any curve of complexity a simplification of of complexity can be computed in time.
By definition of the simplification of and the triangle inequality for any curve there is a subcurve of of complexity at most such that .
Definition 8 (Free space diagram).
Let and be two polygonal curves parametrized over . The free space diagram of and is defined as the joint parametric space together with a not necessarily uniform grid, where each vertical grid line corresponds to a vertex of and each horizontal grid line to a vertex of . The -free space (refer to Figure 2) of and is defined as
This is the set of points in the parametric space, whose corresponding points on and are at a distance at most . The edges of and segment the free space diagram into cells. We call the intersection of with the boundary of cells the free space intervals.
As was noted by Alt and Godau [2], the Fréchet distance of and is at most if and only if there is a monotone (in both - and -direction) path from to in -free space of and . They further observed that the free space inside any cell is described by an ellipse intersected with the cell and thus is convex and has constant complexity. Observe that two subcurves and have Fréchet distance at most if and only if there is a monotone (in both - and -direction) path from to in -free space of and .
Definition 9 (Extremal points [4]).
As the -free space of two curves and is convex in any cell , the set of points of minimizing the -coordinate in are described by either a single point or a vertical line segment. We call either the single point or the start and end point of the line segment the leftmost points of in . Similarly has at most two bottommost, rightmost and topmost points. The set of all these at most eight points of in every cell defines the set of extremal points of .
Similar to [23], we define specific subcurves via the extremal points of the -free space.
Definition 10 (-, - and -subcurves).
Let be a polygonal curve and let be a simplification of . Let and be given. Define the following three types of subcurves of via the (sorted) set of -coordinates of extremal points in . For every edge let be the subset of corresponding to the extremal points in (refer to Figure 3).
-
:
A -subcurve of is a subcurve of that starts at some th vertex of and ends at the th vertex for .
-
:
A -subcurve of is either a subcurve or of an edge of defined by a vertex of and a value or its reversal or .
-
:
A -subcurve of is either a subcurve of an edge of defined by two values such that , or its reversal .
The set containing all Type -, - and -subcurves which we denote by describes the set of candidate curves similar to the candidate set from [23].
Theorem 11 (adapted from [23]).
Let be a polygonal curve, and let be a simplification of . Then for any curve there is a set of size at most such that
4 Sweep-sequences and the proxy coverage
In this section we define an ordering of Type and Type subcurves in the set and introduce an approximate -coverage called the proxy coverage. Later on, we show that a coarse representation of the proxy coverage can be maintained along the ordering.
Throughout this section we are given a polygonal curve of complexity , values and , as well as a simplification of . As for any edge , there are Type -, Type - and Type -subcurves in .
Definition 12 (Sweep-sequence).
Let be a sorted list of values in . We say is a sweep-sequence of if is a list of tuples of such that either for all consecutive tuples and it holds that , , and or for all consecutive tuples it holds that , , and .
We are able to construct few sweep-sequences that already capture the entirety of .
Lemma 13.
Let be an edge of . In time , one can compute a set of sweep-sequences of , each of length , such that for any Type - or Type -subcurve on the edge there is a tuple in one of the sweep-sequences in such that if (refer to Figure 4), and if . Further, for the first and last pair in every sweep-sequence it holds that .
We focus our analysis on sweep-sequences of where for every tuple it holds that (refer to Figure 4). This analysis carries over to all sweep-sequences of by setting , and thus to all sweep-sequences in .
We now turn to defining the proxy coverage on which approximates the -coverage.
Definition 14.
Let be an edge of . Define to be the function mapping any to the -coordinate of the leftmost point of the th cell in at height . If this point does not exist, . Similarly define to be the -coordinate of the rightmost point at height , and otherwise.
We assume that for every edge and every the functions and of the free space can be evaluated in time.
Definition 15 (Combinatorial representation).
Let be an edge of and let be given. The combinatorial representation of the -coverage of is the set of all inclusionwise-maximal pairs of indices such that there are points and on the th and th edge of respectively such that there is a monotone path from to in . An index pair includes if and .
The combinatorial representation can be partitioned into two sets, the global group consisting of all index-pairs such that and the local group consisting of all index-pairs .
Observation 16.
Let be a subcurve of an edge of . Then (refer to Figure 5)
Let an edge of together with be given. We say an index is bad for , if all topmost points in cell of the free space of and are to the left of both and , and both and are to the left of all bottom most points in cell . Call this set of bad indices . If , is said to be good for . Intuitively, an index is bad if the free-space inside the cell is a diagonal from the top left to the bottom right. For Lemma 21 to hold, the definition a bad index is stronger and depends on and .
Observation 17.
Let be good for and . If then .
We now turn to defining our central tool, the proxy coverage, which approximates the -coverage and is easy to maintain. It is defined via a reduced combinatorial representation.
Definition 18 (Reduced global group).
Let be an edge of and let be given. Based on the global group we define the reduced global group which results from after merging all pairs of index pairs and if either , or and is good for (see Figure 5).
Definition 19 (Proxy coverage).
For edge of , and define
With these at hand, define the proxy coverage of subedges of as the union
where by a slight abuse of notation let be all index-pairs in such that is not in .
We observe that the proxy coverage can be expressed as a disjoint union via .
Lemma 20.
Let be an edge and let . Then is the disjoint union (refer to Figure 6)
Lemma 21.
If is bad for , then is good for and .
Lemma 22.
Let be an edge of and let be given. Then (refer to Figure 6)
We extend the proxy coverage to also be defined for Type -subcurves of , where we set . Overall, we see that for any – not just Type - and -subcurves of – there are at most two elements with
Theorem 23.
Let be a polygonal curve, and let be a simplification of . Then for any curve there is a set of size at most such that
Proof.
This follows from the definition of the proxy coverage, Lemma 22 and Theorem 11.
5 Maintaining the proxy coverage along a sweep-sequence
In this section we present an algorithm that given maintains a symbolic representation of in (roughly) logarithmic time per element in . To this end we maintain the set of inclusion-wise maximal index-pairs such that there is a path from cell at height to cell at some height , which helps maintain and . We want to highlight that maintaining and as a symbolic representation of the -Coverage can take total time in (Figure 7) motivating our techniques.
Theorem 24.
At the beginning and end of the loop in Lines of Maintain of Algorithm 1, the set of index-pairs coincides with , coincides with and coincides with . Further, executing Maintain takes time.
Corollary 25.
Let be a sweep-sequence. There are (not necessarily distinct) index-pairs together with contiguous subsets such that for every it holds that
Further for the index is either always good or always bad for for , and the index is either always good or always bad for for . The index-pair as well as the values and can be computed for all in total time .
Overall, we can maintain a symbolic representation of during a sweep-sequence in total time which enables the use of the maintained sets for batch point-queries.
Theorem 26.
Let , and let together with be a weighted point-set. For any let denote its weight. There is an algorithm which computes for every in time.
Proof.
Define following values for all , along the sweep-sequence where for :
The value corresponds to the weight of points in cell that lie right of . Similarly, corresponds to the weight of points in cell that lie left of . The value corresponds to the weight of points in cell that lie in between and if is good for . Lastly corresponds to the total weight of points that lie on edge . Then for any and any it holds that . Similarly, for any it holds that . Then, by Lemma 20
Observe that can be provided via a data-structure that first computes for every in total time and stores them in a balanced binary tree as leaves, where every inner node stores the sum of the values of its children. For every the value can then be returned in time by identifying in time all maximal subtrees whose children lie in the interval and then returning the sum of the stored values in the root of each maximal subtree.
Next we show that (resp. and ) can correctly be maintained when performing the sweep of . To this end let
Refer to Figure 8. As the free space in every cell is convex, throughout the first indices are monotone and the -coordinates of the left-most and right-most points are stored in , for every the contiguous subsets
can be computed in time. Similarly the set is also a contiguous subset of and can be computed in time. Let denote all that lie on the th edge of . Then for any
and thus consists of contiguous disjoint subsets of . Thus all sets (represented as contiguous subsets of ) can be computed in time . Further,
and hence can be maintained in total time after one initial computation of all , by adding for whenever enters the contiguous disjoint subsets of , and subtracting for whenever exits the contiguous disjoint subsets of . Sorting the boundaries of all preparing them for the maintenance of takes time. Similarly and can be maintained.
Overall, the values , , and are correctly maintained in total time time such that they can be evaluated in . By Corollary 25 there are only total updates to and . Hence, can be correctly maintained along by updating it whenever , , , , or change. Thus computing for all takes time.
6 Ground-set discretization and sampling
We now present two -approximation algorithm for SC that uses the efficient batch point queries from Theorem 26.
Definition 27 (Atomic intervals).
Let be a polygonal curve, and let and be given. Let be a simplification of . Let be the set of all intersection-points of horizontal lines at height for -coordinates of extremal points of with the boundary of the free space . From this, define the set of atomic intervals as the set of intervals describing the arrangement of defined by the set (Figure 9).
Observation 28.
As , and each horizontal line intersects at most cells, and the free space in every cell is convex it follows, the set of all midpoints of atomic intervals is a point set of size such that for any it holds that
Lemma 29.
Let be a polygonal curve of complexity and let and be given. Let be a simplification of . Let be the extremal points of . Let be a given set of size at most points. For every edge of let be the set of atomic intervals in such that . Let be the size of the smallest set such that . There is an algorithm that computes a set of size such that in time
Proof Sketch.
The algorithm operates in rounds. In every round we start with a subset of , where initially , and in every round for the partial solution of the greedy algorithm it holds that . Further in every round and for every edge of we maintain the set of atomic intervals in such that . These weights in combination with Theorem 26 allow us to identify the Type - or -subedge of maximizing . We similarly identify the Type -curve maximizing by initially computing for every Type -curve and then in every round recomputing . Afterwards we add the subcurve maximizing to . Lastly we update . This results in almost all intervals in whose weights are updated, to have their weights set to zero and are thus removed from . The number of updates setting such a weight to zero, together with the initial application of Theorem 26 dominate the running time. Lastly, by standard greedy SetCover arguments [14, 22, 18] and Theorem 23 the number of rounds until the algorithm terminates () is bounded by .
Corollary 30.
There is an -approximation algorithm for SC. Given a polygonal curve of complexity , and , its running time is in .
Proof.
Lemma 29 together with Observation 28 and Theorem 23 yields the claim.
The running time of Corollary 30 is dominated by the size of . We now present techniques enabling identification of a representative subset of .
Theorem 31.
Let lists be given, each containing sorted values such that all values are distinct and in every list and for every identifying the item at position takes time and also for given determining the maximal index such that the th item is less than takes time. Then for every in time one can determine values such that
-
1.
for every there is an such that and
-
2.
for all .
Proof Sketch.
We recursively compute a value such that both a constant fraction of values among all lists is smaller than and a constant fraction of values among all lists is bigger than . To find we compute the middle element of every list in time. Then is (roughly) the middle element of these middle elements.
Lemma 32.
For every in time one can determine so called -coarse intervals partitioning , each containing at most intervals of .
Proof Sketch.
By the structure of we can compute how many atomic intervals of are contained on any edge of in time. Similarly, on any edge of we can compute the th atomic interval boundary via binary searches of . Hence after precomputation time per edge of , we can output the th interval boundary of in time for any . Hence Theorem 31 implies the claim.
We now use Lemma 32 together with Lemma 29 to obtain the following lemma and theorem. The overarching approach for the algorithm is as follows: First use Lemma 32 to compute roughly points from . These points we cover with a set of elements from in time via Lemma 29. Between any two consecutive points of the points there are at most other points from , and at most of these sets are not covered by . This leaves us with uncovered points in . We again invoke Lemma 29, covering these remaining points with a set of elements from in time .
Lemma 33.
Let and be given. Let be a set covering all midpoints of -coarse intervals of size . Then there are atomic intervals in and atomic intervals in that are not contained in . They can be computed in time.
Theorem 4. [Restated, see original statement.]
There is a -approximation for SC. Given a polygonal curve of complexity , and , its running time is in , where is the size of the smallest subset such that .
Proof Sketch.
The algorithm maintains an internal guess of , which initially is . We describe a subroutine that decides, whether or outputs a solution of size . If the subroutine outputs that we set , which then implies the claim.
First set . Next compute a set of -coarse intervals via Lemma 32. From them compute their midpoints and for every edge of compute the subset of containing such a midpoint. Then is the set of atomic intervals in with . The algorithm then attempts to cover the mid-points of these intervals via rounds of the algorithm from Lemma 29. If the algorithm does not terminate after rounds we return that . Otherwise we compute the set of atomic intervals and for every the set of atomic intervals in that are not fully contained in the solution returned from the first call of the algorithm from Lemma 29 via Lemma 33. We then invoke the algorithm from Lemma 29 on the midpoints of and which is precisely the set of atomic intervals in with . As and thus , the running time of the subroutine is
And thus overall the running time of the algorithm is
7 Subtrajectory Coverage Maximization
The techniques discussed in this paper can also be used to obtain the following result.
Theorem 5. [Restated, see original statement.]
Let . There is a -approximation algorithm for SCM, where is the base of the natural logarithm. Given a polygonal curve of complexity , , and , its running time is in .
Proof Sketch.
We first compute an -approximation of the -free space, which allows describing as a sum of linear functions. A sum of linear functions is a linear function itself. As we can maintain a combinatorial description of the proxy coverage we are able to parsimoniously maintain this linear function during a scan of a sweep-sequence in , which allows evaluating for all elements in a single sweep-sequence in time roughly . Using this subroutine once per round and sweep-sequence results in a running time of .
References
- [1] Pankaj K. Agarwal, Kyle Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, and Erin Taylor. Subtrajectory clustering: Models and algorithms. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018, PODS ’18, pages 75–87, 2018. doi:10.1145/3196959.3196972.
- [2] Helmut Alt and Michael Godau. Computing the fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl., 5:75–91, 1995. doi:10.1142/S0218195995000064.
- [3] Frederik Brüning, Hugo A. Akitaya, Erin W. Chambers, and Anne Driemel. Subtrajectory clustering: Finding set covers for set systems of subcurves. Comput. Geom. Topol., 2(1):1:1–1:48, February 2023. doi:10.57717/cgt.v2i1.7.
- [4] Frederik Brüning, Jacobus Conradi, and Anne Driemel. Faster approximate covering of subcurves under the fréchet distance. In 30th Annual European Symposium on Algorithms (ESA 2022), volume 244 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1–28:16, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ESA.2022.28.
- [5] Kevin Buchin, Maike Buchin, David Duran, Brittany Terese Fasy, Roel Jacobs, Vera Sacristan, Rodrigo I. Silveira, Frank Staals, and Carola Wenk. Clustering trajectories for map construction. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’17, 2017. doi:10.1145/3139958.3139964.
- [6] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Jorren Hendriks, Erfan Hosseini Sereshgi, Vera Sacristán, Rodrigo I. Silveira, Jorrick Sleijster, Frank Staals, and Carola Wenk. Improved map construction using subtrajectory clustering. In LocalRec’20: Proceedings of the 4th ACM SIGSPATIAL Workshop on Location-Based Recommendations, Geosocial Networks, and Geoadvertising, LocalRec@SIGSPATIAL 2020, November 3, 2020, Seattle, WA, USA, pages 5:1–5:4, 2020. doi:10.1145/3423334.3431451.
- [7] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Maarten Löffler, and Jun Luo. Detecting commuting patterns by clustering subtrajectories. International Journal of Computational Geometry and Applications, 21(3):253–282, 2011. doi:10.1142/S0218195911003652.
- [8] Maike Buchin, Bernhard Kilgus, and Andrea Kölzsch. Group diagrams for representing trajectories. International Journal of Geographical Information Science, 34(12):2401–2433, 2020. doi:10.1080/13658816.2019.1684498.
- [9] Jacobus Conradi and Anne Driemel. Finding complex patterns in trajectory data via geometric set cover, 2023. doi:10.48550/arXiv.2308.14865.
- [10] Mark de Berg, Atlas F. Cook IV, and Joachim Gudmundsson. Fast fréchet queries. Computational Geometry, 46(6):747–755, 2013. doi:10.1016/j.comgeo.2012.11.006.
- [11] Joachim Gudmundsson and Nacho Valladares. A GPU approach to subtrajectory clustering using the fréchet distance. IEEE Trans. Parallel Distributed Syst., 26(4):924–937, 2015. doi:10.1109/TPDS.2014.2317713.
- [12] Joachim Gudmundsson and Sampson Wong. Cubic upper and lower bounds for subtrajectory clustering under the continuous fréchet distance, 2021. doi:10.48550/arXiv.2110.15554.
- [13] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013. doi:10.1109/TPAMI.2013.248.
- [14] David S Johnson. Approximation algorithms for combinatorial problems. In Proceedings of the fifth annual ACM symposium on Theory of computing, pages 38–49, 1973. doi:10.1145/800125.804034.
- [15] Andreas Krause and Daniel Golovin. Submodular function maximization. In Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 71–104. Cambridge University Press, 2014. doi:10.1017/CBO9781139177801.004.
- [16] Lily Lee and W Eric L Grimson. Gait analysis for recognition and classification. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pages 155–162. IEEE, 2002. doi:10.1109/AFGR.2002.1004148.
- [17] Anqi Liang, Bin Yao, Bo Wang, Yinpei Liu, Zhida Chen, Jiong Xie, and Feifei Li. Sub-trajectory clustering with deep reinforcement learning. VLDB J., 33(3):685–702, 2024. doi:10.1007/s00778-023-00833-w.
- [18] László Lovász. On the ratio of optimal integral and fractional covers. Discrete mathematics, 13(4):383–390, 1975. doi:10.1016/0012-365X(75)90058-8.
- [19] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Mathematical Programming, 14(1):265–294, 1978. doi:10.1007/BF01588971.
- [20] Sen Qiao, Y. Wang, and J. Li. Real-time human gesture grading based on OpenPose. 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pages 1–6, 2017. doi:10.1109/CISP-BMEI.2017.8301910.
- [21] Roniel S. De Sousa, Azzedine Boukerche, and Antonio A. F. Loureiro. Vehicle trajectory similarity: Models, methods, and applications. ACM Comput. Surv., 53(5), September 2020. doi:10.1145/3406096.
- [22] Sherman K Stein. Two combinatorial covering theorems. Journal of Combinatorial Theory, Series A, 16(3):391–397, 1974. doi:10.1016/0097-3165(74)90062-4.
- [23] Ivor van der Hoog, Thijs van der Horst, and Tim Ophelders. Faster and deterministic subtrajectory clustering, 2024. doi:10.48550/arXiv.2402.13117.
- [24] Erik van Sebille, Stephen M. Griffies, Ryan Abernathey, Thomas P. Adams, Pavel Berloff, Arne Biastoch, Bruno Blanke, Eric P. Chassignet, Yu Cheng, Colin J. Cotter, Eric Deleersnijder, Kristofer Döös, Henri F. Drake, Sybren Drijfhout, Stefan F. Gary, Arnold W. Heemink, Joakim Kjellsson, Inga Monika Koszalka, Michael Lange, Camille Lique, Graeme A. MacGilchrist, Robert Marsh, C. Gabriela Mayorga Adame, Ronan McAdam, Francesco Nencioli, Claire B. Paris, Matthew D. Piggott, Jeff A. Polton, Siren Rühs, Syed H.A.M. Shah, Matthew D. Thomas, Jinbo Wang, Phillip J. Wolfram, Laure Zanna, and Jan D. Zika. Lagrangian ocean analysis: Fundamentals and practices. Ocean Modelling, 121:49–75, 2018. doi:10.1016/j.ocemod.2017.11.008.
- [25] Yiqun Xie, Shashi Shekhar, and Yan Li. Statistically-robust clustering techniques for mapping spatial hotspots: A survey. ACM Comput. Surv., 55(2), January 2022. doi:10.1145/3487893.
