Abstract 1 Introduction 2 Preliminaries 3 Set system discretization 4 Sweep-sequences and the proxy coverage 5 Maintaining the proxy coverage along a sweep-sequence 6 Ground-set discretization and sampling 7 Subtrajectory Coverage Maximization References

Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better

Jacobus Conradi ORCID University of Bonn, Germany Anne Driemel ORCID University of Bonn, Germany
Abstract

Many application areas collect unstructured trajectory data. In subtrajectory clustering, one is interested to find patterns in this data using a hybrid combination of segmentation and clustering. We analyze two variants of this problem based on the well-known SetCover and CoverageMaximization problems. In both variants the set system is induced by metric balls under the Fréchet distance centered at polygonal curves. Our algorithms focus on improving the running time of the update step of the generic greedy algorithm by means of a careful combination of sweeps through a candidate space. In the first variant, we are given a polygonal curve P of complexity n, distance threshold Δ and complexity bound and the goal is to identify a minimum-size set of center curves 𝒞, where each center curve is of complexity at most and every point p on P is covered. A point p on P is covered if it is part of a subtrajectory πp of P such that there is a center c𝒞 whose Fréchet distance to πp is at most Δ. We present an approximation algorithm for this problem with a running time of 𝒪((n2+kΔn5/2)log2n), where kΔ is the size of an optimal solution. The algorithm gives a bicriterial approximation guarantee that relaxes the Fréchet distance threshold by a constant factor and the size of the solution by a factor of 𝒪(logn). The second problem variant asks for the maximum fraction of the input curve P that can be covered using k center curves, where kn is a parameter to the algorithm. For the second problem variant, our techniques lead to an algorithm with a running time of 𝒪((k+)n2log2n) and similar approximation guarantees. Note that in both algorithms k,kΔO(n) and hence the running time is cubic, or better if kn.

Keywords and phrases:
Clustering, Set cover, Fréchet distance, Approximation algorithms
Funding:
Jacobus Conradi: Partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 313421352 (FOR 2535 Anticipating Human Behavior) and the iBehave Network: Sponsored by the Ministry of Culture and Science of the State of North Rhine-Westphalia. Affiliated with Lamarr Institute for Machine Learning and Artificial Intelligence.
Anne Driemel: Affiliated with Lamarr Institute for Machine Learning and Artificial Intelligence.
Copyright and License:
[Uncaptioned image] © Jacobus Conradi and Anne Driemel; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
Full Version: https://arxiv.org/abs/2504.17381
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

Trajectory analysis is a broad field ranging from gait analysis of full-body motion trajectories [16, 20, 13], via traffic analysis [21] and epidemiological hotspot detection [25] to Lagrangian analysis of particle simulations [24]. Using the current availability of cheap tracking technology, vast amounts of unstructured spatio-temporal data are collected. Clustering is a fundamental problem in this area and can be formulated under various similarity metrics, among them the Fréchet distance. However, the unstructured nature of trajectory data poses the additional challenge that the data first needs to be segmented before the trajectory segments, which we call subtrajectories, can be clustered in a metric space. Subtrajectory clustering aims to find clusters of subtrajectories that represent complex patterns that reoccur along the trajectories [1, 8, 17], such as commuting patterns in traffic data.

We study two variants of a problem posed by Akitaya, Brüning, Chambers, and Driemel [3], which are based on the well-known set cover and coverage maximization problems. Both rely on the definition of a geometric set system built using metric balls under the Fréchet distance. Given a polygonal curve, we are asked to produce a set of “center” curves that either

  1. 1.

    has minimum cardinality and covers the entirety of the input curve, or

  2. 2.

    covers as much as possible of the input curve under cardinality constraints.

In either setting a point p of the input-curve is considered as “covered”, if there is a subcurve πp of the input curve that contains p and has small Fréchet distance to some center curve. Note that a point p may be covered by multiple center curves. For the precise formulation refer to Section 2. This formulation extends immediately to a set of input curves – instead of one – where one is now tasked to cover the entirety (respectively as much as possible) of all points on all input curves combined.

In this paper we study the continuous variant of the problem where we need to cover the continuous set of points given by the input polygonal curve. It is instructive, however, to first consider a discrete variant of the set cover instance as follows. Given a discrete point sequence A=a1,,an, consider the set of contiguous subsequences of the form Aij=ai,,aj. We say ak is contained in the set defined by Aij if there exist ikj such that Aij is similar to Aij. Note that a full specification of this set system has size Θ(n3) in the worst case. This phenomenon is shared with the continuous setting where all known discretizations lead to a cubic-size set system. In this paper, we examine the question of whether we can obtain subcubic-time algorithms by internally representing the set system implicitly.

2 Preliminaries

An edge is a map e:[0,1]d defined by two points p,qd as the linear interpolation tp+t(qp). A polygonal curve is a map P:[0,1]d defined by an ordered set (p1,,pn)d as the result of concatenating the edges (e1,,en1), where ei is the edge from pi to pi+1. Any point P(t) is said to lie on the edge ei if t lies in the preimage of ei. Any point P(t) lies on at most two edges, exactly if it defines the endpoint of edge ei which coincides with the start point of ei+1. The number of points n defining the polygonal curve P is its complexity, and we denote it by |P|. The set of all polygonal curves in d of complexity at most we denote by 𝕏d. For any polygonal curve P and values 0st1 we define P[s,t] to be the subcurve of P from s to t, which itself is a polygonal curve of complexity at most |P|. For two polygonal curves P and Q we define the continuous Fréchet distance to be d(P,Q)=inff,gmaxt[0,1]P(f(t))Q(g(t)) where f and g range over all non-decreasing surjective functions from [0,1] to [0,1].

Definition 1 (Δ-coverage [3]).

Let P be a polygonal curve in d, and let Δ>0 and 2 be given. For any 𝒬𝕏d, define the Δ-coverage of 𝒬 on P (refer to Figure 1) as

CovP(𝒬,Δ)=Q𝒬(0st1,d(P[s,t],Q)Δ[s,t])[0,1].

For a curve Q𝕏d we denote the Δ-coverage CovP({Q},Δ) by CovP(Q,Δ).

Figure 1: a): Example of all points on P that lie on subcurve of P that have Fréchet distance at most Δ to a curve Q of complexity 3. b): The set CovP(Q,Δ)[0,1].

Problem 1: Subtrajectory Covering (SC).

Let P be a polygonal curve in d, and let Δ>0 and 2 be given. The task is to identify a set 𝒬 of curves in 𝕏d minimizing |𝒬| such that CovP(𝒬,Δ)=[0,1].

Definition 2 (Bicriteria approximation for SC).

Let P be a polygonal curve in d, and let Δ>0 and 2 be given. Let C𝕏d be a set of minimum cardinality such that CovP(C,Δ)=[0,1]. Let α,β1 be given. An algorithm that outputs a set C𝕏d of size α|C| such that CovP(C,βΔ)=[0,1] is called an (α,β)-approximation for SC.

Problem 2: Subtrajectory Coverage Maximization (SCM).

Let P be a polygonal curve in d, and let Δ>0, 2 and k1 be given. The task is to identify a set 𝒬 of k curves in 𝕏d maximizing the Lebesgue measure CovP(𝒬,Δ).

Definition 3 (Bicriteria approximation for SCM).

Let P be a polygonal curve in d, and let Δ>0 and 2 be given. Let C𝕏d be a set of size k such that CovP(C,Δ) is maximum. Let α0 and β1 be given. An algorithm that outputs a set C𝕏d of size k such that CovP(C,βΔ)αCovP(C,Δ) is called an (α,β)-approximation for SCM.

2.1 Prior work on Subtrajectory Covering and Coverage Maximization

We first discuss prior work on the SC problem. The SC problem was introduced by Akitaya, Brüning, Chambers, and Driemel [3]. Their results were later improved by Brüning, Conradi, and Driemel [4]. Both works identified a curve S that approximates the input P, such that any solution to the problem induces an approximate solution of similar size consisting of only subcurves of S. This set of subcurves defines a set system which turns out to have low VC-dimension. This enables randomized set cover algorithms with improved approximation factors and expected running time in 𝒪(n(kΔ)3(log4(Λ/Δ)+log3(n/kΔ))+nlog2n), where Λ corresponds to the arc length of the curve P. While the complexity of the main algorithm in [3] depends on the relative arclength of the input curve, the latter [4] also identified a set of points on S that define 𝒪(n3) subcurves of complexity 2 which induce the set-system of the SetCover instance resulting in an expected running time of 𝒪~(kΔn3), where 𝒪~() hides polylogarithmic factors. The approximation quality of both approaches depends on , kΔ as well as on the VC-dimension of the set system. A drawback of these algorithms is that they generate cluster centers that are line segments only and that the algorithm is randomized with large constants in the approximation factors. Conradi and Driemel [9] took a different approach by focusing on the greedy set cover algorithm, which successively adds the candidate curve maximizing the added coverage to a partial solution until the entirety of [0,1] is covered. Their algorithm computes 𝒪(n3) candidate curves that have complexity up to instead of 2. Applying the greedy algorithm, this results in a deterministic (𝒪(logn),11)-approximation algorithm with running time 𝒪~(kΔn4+n42). Recently, van der Hoog, van der Horst and Ophelders showed [23] that the cardinality of the candidate curve set can be further reduced to a size of 𝒪(n2logn). They improve the quality of the approximating curve S resulting in a (𝒪(logn),4)-approximation with running time in 𝒪~(kΔn3).

As for the SCM problem, by standard sub-modular function maximization arguments [19, 15], the greedy algorithm used in all these approaches [3, 4, 9, 23] yields an (𝒪(1),𝒪(1))-approximation. Using this reduction, the candidate curve set identified in [23] yields a (𝒪(1),4)-approximation to the SCM problem with a running time of 𝒪~(kn3).

2.2 Structure and contributions

In Section 3 we describe how to obtain a set of 𝒪~(n2) candidate center curves 𝒞 based on known transformations [23]. Unlike the recent line of work discussed above [3, 4, 9, 23], we do not improve any further on the size of the set 𝒞. Instead, our focus lies on the update step of the greedy set cover algorithm. In this step, we have to find the element c𝒞 that maximizes the coverage that is added to a partial solution. In Section 4 we describe our first contribution: a partition of the set 𝒞 into few ordered lists, called sweep-sequences, which we each process via a sweep algorithm computing their coverage reusing prior computations. Unfortunately, the coverage does not immediately allow efficient maintenance along a sweep-sequence. Here, we introduce our second contribution: a candidate-specific approximation of the coverage, called the proxy coverage. In Section 5, we describe how sweep-sequences allow efficient maintenance of the proxy coverage and with Theorem 26 describe the culmination of our techniques enabling our algorithm: for a weighted set A[0,1] of points we are able to compute the total weight of points that lie inside the proxy coverage of every element in the sweep-sequence in logarithmic time per element.

In Section 6 we present two (𝒪(logn),4)-approximation algorithms for Subtrajectory Covering. The algorithms discretize the ground-set [0,1] by the arrangement (of size 𝒪~(n3)) of the intervals representing the coverage of each element in 𝒞. Computing the arrangement together with standard greedy set cover techniques and the subroutine from Theorem 26 results in a first approximation algorithm for SC with running time in 𝒪~(|𝒞|n+kΔ|𝒞|)=𝒪~(n3) (see Corollary 30). Our third contribution is a sampling technique improving the running time: we identify a representative subset (comparable to a sampling) of size roughly 𝒪(kΔn3/2) of the arrangement. This enables a faster algorithm by first covering the representative subset and with it almost the entire arrangement in an initial pass. The remaining 𝒪(kΔn5/2) elements of the arrangement are then identified and covered resulting in the following theorem.

Theorem 4.

There is a (96ln(n)+128,4)-approximation for SC. Given a polygonal curve P of complexity n, Δ>0 and n, its running time is in 𝒪((n2+kΔn52)log2n), where kΔ is the size of the smallest subset C𝕏d such that CovP(C,Δ)=[0,1].

As trivially kΔn, Theorem 4 yields an algorithm with (near-)cubic running time. Further, in the case that kΔ𝒪(n1ε), it yields an algorithm with subcubic running time. This compares favorably to the best known algorithm with running time 𝒪~(kΔn3) [23].

In Section 7, we sketch how our techniques can be applied to obtain faster algorithms for the SCM problem which compares favorably to the best known algorithm with running time 𝒪~(kn3) [23]. All omitted proofs and discussions can be found in the full version.

Theorem 5.

Let ε(0,15]. There is a (e116e,4+ε)-approximation algorithm for SCM, where e is the base of the natural logarithm. Given a polygonal curve P of complexity n, Δ>0, n and k>0, its running time is in 𝒪((k+)n2ε2log2nlogε1).

2.3 Related work

Prior to the line of work on the SC and SCM problem discussed in Section 2.1, Buchin et al. [7] presented one of the earlier works on subtrajectory clustering for both the discrete and continuous Fréchet distance. Their work focuses on finding the largest subtrajectory cluster, where different variants of “largest” are considered. They present hardness results for (2ε)-approximations for any ε and a matching polynomial time 2-approximation algorithm. Gudmundsson and Wong [12] present a cubic lower bound conditioned on SETH for the same problem and show that this lower bound is tight. In subsequent experimental work [11, 5, 6, 8] the formulation from [7] was studied, but these works do not offer any theoretical guarantees.

Agarwal et al. [1] base their formulation of the subtrajectory clustering problem on FacilityLocation. In this formulation there is an opening cost associated with every center curve, a cost for every point on the input that is assigned to a cluster, and a cost for points that are not assigned. They present conditional NP-hardness results and an O(log2n)-approximation for well-behaved classes of curves under the discrete Fréchet distance.

3 Set system discretization

In this section we show that, given a curve P of complexity n, there is a curve S of complexity n and a set 𝒞S(P) of at most 𝒪(n2logn) subcurves of S, each of complexity at most , such that for any set of curves C𝕏d there is a subset C^𝒞S(P) of size 𝒪(|C|) such that CovP(C,Δ)CovP(C^,4Δ). This result is known and follows the line of research in [4, 9, 23]. All of these approaches compute a “maximally simplified” version S of P, see also [10]. Our definition of a simplification follows the notion of pathlet-preserving simplifications from [23], as it yields the best constants.

Definition 6.

Let a polygonal curve P, Δ0, and 1 be given. A curve S is a simplification of P if d(S,P)2Δ, and for any curve Q𝕏d and P[s,t] with d(Q,P[s,t])Δ there is a subcurve S[s,t] with d(P[s,t],S[s,t])2Δ and |S[s,t]|+2.

Theorem 7 ([23]).

For any curve P of complexity n a simplification S of P of complexity 𝒪(n) can be computed in 𝒪(nlogn) time.

By definition of the simplification S of P and the triangle inequality for any curve π𝕏d there is a subcurve π^ of S of complexity at most +2 such that CovP(π,Δ)CovP(π^,4Δ).

Figure 2: Illustration of the Δ-free space of two curves P and Q as well as their extremal points.
Definition 8 (Free space diagram).

Let S and P be two polygonal curves parametrized over [0,1]. The free space diagram of S and P is defined as the joint parametric space [0,1]2 together with a not necessarily uniform grid, where each vertical grid line corresponds to a vertex of P and each horizontal grid line to a vertex of S. The Δ-free space (refer to Figure 2) of S and P is defined as

𝒟Δ(S,P)={(x,y)[0,1]2P(x)S(y)Δ}.

This is the set of points in the parametric space, whose corresponding points on S and P are at a distance at most Δ. The edges of S and P segment the free space diagram into cells. We call the intersection of 𝒟Δ(S,P) with the boundary of cells the free space intervals.

As was noted by Alt and Godau [2], the Fréchet distance of S and P is at most Δ if and only if there is a monotone (in both x- and y-direction) path from (0,0) to (1,1) in Δ-free space of S and P. They further observed that the free space inside any cell is described by an ellipse intersected with the cell and thus is convex and has constant complexity. Observe that two subcurves P[a,c] and S[b,d] have Fréchet distance at most Δ if and only if there is a monotone (in both x- and y-direction) path from (a,b) to (c,d) in Δ-free space of S and P.

Definition 9 (Extremal points [4]).

As the Δ-free space of two curves S and P is convex in any cell C, the set of points of 𝒟Δ(S,P) minimizing the x-coordinate in C are described by either a single point or a vertical line segment. We call either the single point or the start and end point of the line segment the leftmost points of 𝒟Δ(S,P) in C. Similarly C has at most two bottommost, rightmost and topmost points. The set of all these at most eight points of 𝒟Δ(S,P) in every cell defines the set of extremal points of 𝒟Δ(S,P).

Similar to [23], we define specific subcurves via the extremal points of the 4Δ-free space.

Figure 3: Illustration of all Type (I)-, (II)- and (III)-subcurves of Q (as vertical lines) that are not reversals, induced by the 4Δ-free space of Q and P from Figure 2. Further denoted are the values j, which induced the set of Type (III)-subcurves on the first edge of Q.
Definition 10 ((I)-, (II)- and (III)-subcurves).

Let P be a polygonal curve and let S be a simplification of P. Let Δ and be given. Define the following three types of subcurves of S via the (sorted) set (S,P) of y-coordinates of extremal points in 𝒟Δ(S,P). For every edge e let (e,P) be the subset of (S,P) corresponding to the extremal points in 𝒟Δ(e,P) (refer to Figure 3).

  1. (I):

    A (I)-subcurve of S is a subcurve of S that starts at some ith vertex of S and ends at the (i+j)th vertex for j{2m0mlog2()}.

  2. (II):

    A (II)-subcurve of S is either a subcurve e[0,εi] or e[εi,1] of an edge e of S defined by a vertex of e and a value εi(e,P) or its reversal rev(e[0,εi]) or rev(e[εi,1]).

  3. (III):

    A (III)-subcurve of S is either a subcurve e[εi,εi+j] of an edge e of S defined by two values εi,εi+j(e,P) such that j{2m0mlog2(|(e,P)|)}, or its reversal rev(e[εi,εi+j]).

The set containing all Type (I)-, (II)- and (III)-subcurves which we denote by 𝒞S(P) describes the set of candidate curves similar to the candidate set from [23].

Theorem 11 (adapted from [23]).

Let P be a polygonal curve, and let S be a simplification of P. Then for any curve π𝕏d there is a set Sπ𝒞S(P) of size at most 8 such that

CovP(π,Δ)sSπCovP(s,4Δ).

4 Sweep-sequences and the proxy coverage

In this section we define an ordering of Type (II) and Type (III) subcurves in the set 𝒞S(P) and introduce an approximate Δ-coverage called the proxy coverage. Later on, we show that a coarse representation of the proxy coverage can be maintained along the ordering.

Throughout this section we are given a polygonal curve P of complexity n, values Δ>0 and n, as well as a simplification S of P. As |(e,P)|=𝒪(n) for any edge e, there are 𝒪(nlog) Type (I)-, 𝒪(n2) Type (II)- and 𝒪(n2logn) Type (III)-subcurves in 𝒞S(P).

Figure 4: Illustration of three of the eight sweep-sequences in 𝔖e of the edge e that are constructed for Type (III)-subcurves. From left to right one for j=4, one for j=2 and one for j=1.
Definition 12 (Sweep-sequence).

Let E=(e1,) be a sorted list of values in . We say 𝔰 is a sweep-sequence of E if 𝔰 is a list of tuples of E such that either for all consecutive tuples (ea,eb) and (ec,ed) it holds that ab, cd, 0ac1 and 0bd1 or for all consecutive tuples it holds that ab, cd, 0ca1 and 0db1.

We are able to construct few sweep-sequences that already capture the entirety of 𝒞S(P).

Lemma 13.

Let e be an edge of S. In time 𝒪(nlogn), one can compute a set 𝔖e of 𝒪(logn) sweep-sequences of (e,P), each of length 𝒪(n), such that for any Type (II)- or Type (III)-subcurve π𝒞S(P) on the edge e there is a tuple (εi,εj) in one of the sweep-sequences in 𝔖e such that π=e[εi,εj] if εiεj (refer to Figure 4), and π=rev(e[εj,εi]) if εi>εj. Further, for the first and last pair (ea,eb) in every sweep-sequence it holds that a=b.

We focus our analysis on sweep-sequences of (e,P) where for every tuple (εa,εb) it holds that ab (refer to Figure 4). This analysis carries over to all sweep-sequences of (e,P) by setting erev(e), and thus to all sweep-sequences in 𝔖e.

We now turn to defining the proxy coverage on P which approximates the 4Δ-coverage.

Definition 14.

Let e be an edge of S. Define li() to be the function mapping any y to the x-coordinate of the leftmost point of the ith cell in 𝒟4Δ(e,P) at height y. If this point does not exist, li(y)=. Similarly define rj(y) to be the x-coordinate of the rightmost point at height y, and otherwise.

We assume that for every edge e and every i<n the functions li() and ri() of the free space 𝒟Δ(e,P) can be evaluated in 𝒪(1) time.

Definition 15 (Combinatorial representation).

Let e be an edge of S and let 0st1 be given. The combinatorial representation (e[s,t]) of the 4Δ-coverage CovP(e[s,t],4Δ) of e[s,t] is the set of all inclusionwise-maximal pairs of indices (i,j) such that there are points s and t on the ith and jth edge of P respectively such that there is a monotone path from (s,s) to (t,t) in 𝒟Δ(e,P). An index pair (i,j) includes (i,j) if ii and jj.

The combinatorial representation can be partitioned into two sets, the global group 𝒢(e[s,t]) consisting of all index-pairs (i,j)(e[s,t]) such that i<j and the local group (e[s,t]) consisting of all index-pairs (i,i)(e[s,t]).

Observation 16.

Let e[s,t] be a subcurve of an edge e of S. Then (refer to Figure 5)

CovP(e[s,t],4Δ) =(i,j)(e[s,t])[li(s),rj(t)]=(i,j)(e[s,t])𝒢(e[s,t])[li(s),rj(t)]

Let an edge e of S together with 0st1 be given. We say an index i is bad for e[s,t], if all topmost points in cell i of the free space of e and P are to the left of both li(s) and ri(t), and both li(s) and ri(t) are to the left of all bottom most points in cell i. Call this set of bad indices (e[s,t]). If i(e[s,t]), i is said to be good for e[s,t]. Intuitively, an index is bad if the free-space inside the cell is a diagonal from the top left to the bottom right. For Lemma 21 to hold, the definition a bad index is stronger and depends on s and t.

Observation 17.

Let i be good for e[s,t] and P. If li(s)ri(t) then li(s)ri(t).

We now turn to defining our central tool, the proxy coverage, which approximates the Δ-coverage and is easy to maintain. It is defined via a reduced combinatorial representation.

Definition 18 (Reduced global group).

Let e be an edge of S and let 0st1 be given. Based on the global group 𝒢(e[s,t]) we define the reduced global group 𝒢~(e[s,t]) which results from 𝒢(e[s,t]) after merging all pairs of index pairs (a,b) and (c,d) if either a<c<b<d, or b=c and b=c is good for e (see Figure 5).

Figure 5: Illustration of the 4Δ-coverage of e[s,t]. The global group of e[s,t] is {(1,4),(4,5)}. The reduced global group of e[s,t] is {(1,4),(4,5)} also, as 4 is bad for e[s,t].
Definition 19 (Proxy coverage).

For edge e of S, 0st1 and 1i,j<n define

l^i,e[s,t](s)={li(s),if i is good for e[s,t]ri(s),if i is bad for e[s,t]
r^j,e[s,t](t)={lj(t),if j is good for e[s,t]rj(t),if j is bad for e[s,t].

With these at hand, define the proxy coverage of subedges of S as the union

Cov^P(e[s,t])=((i,i)(e[s,t])(e[s,t])[li(s),ri(t)])((i,j)𝒢~(e[s,t])[l^i,e[s,t](s),r^j,e[s,t](t)]),

where by a slight abuse of notation let (e[s,t])(e[s,t]) be all index-pairs (i,i) in (e[s,t]) such that i is not in (e[s,t]).

We observe that the proxy coverage can be expressed as a disjoint union via 𝒢~(e[s,t]).

Lemma 20.

Let e be an edge and let 0st1. Then Cov^P(e[s,t]) is the disjoint union (refer to Figure 6)

Cov^P(e[s,t])=(i(e[s,t])(e[s,t])[li(s),ri(t)])((i,j)𝒢~(e[s,t])[l^i,e[s,t](s),r^j,e[s,t](t)]).
Lemma 21.

If i is bad for e[s,t], then i is good for rev(e[s,t]) and li(s),ri(t)[li(t),ri(s)].

Lemma 22.

Let e be an edge of S and let 0st1 be given. Then (refer to Figure 6)

CovP(e[s,t],4Δ)Cov^P(e[s,t])Cov^P(rev(e[s,t]))CovP({e[s,t],rev(e[s,t])},4Δ).
Figure 6: Illustration of the proxy coverage of e[s,t] and rev(e[s,t]) compared to the 4Δ-coverage of e[s,t]. Cells with bad index for e[s,t] and rev(e[s,t]) are highlighted in red.

We extend the proxy coverage Cov^ to also be defined for Type (I)-subcurves of S, where we set Cov^P(S[s,t])=CovP(S[s,t],4Δ). Overall, we see that for any π𝒞S(P) – not just Type (II)- and (III)-subcurves of S – there are at most two elements π1,π2𝒞S(P) with

CovP(π,4Δ)Cov^P(π1)Cov^P(π2)CovP({π1,π2},4Δ).
Theorem 23.

Let P be a polygonal curve, and let S be a simplification of P. Then for any curve π𝕏d there is a set Sπ𝒞S(P) of size at most 16 such that

CovP(π,Δ)sSπCov^P(s).

Proof.

This follows from the definition of the proxy coverage, Lemma 22 and Theorem 11.

Figure 7: Illustration of how the combinatorial description of the 4Δ-coverage of two neighbouring elements in a sweep-sequence may differ by up to n index-pairs. Instead, Cov^(ci)= for all i.

5 Maintaining the proxy coverage along a sweep-sequence

In this section we present an algorithm that given 𝔰𝔖e maintains a symbolic representation of Cov^P(e[s,t]) in (roughly) logarithmic time per element in 𝔰. To this end we maintain the set 𝒰(e[s,t]) of inclusion-wise maximal index-pairs (i,j) such that there is a path from cell i at height s to cell j at some height t^t, which helps maintain 𝒢~(e[s,t]) and (e[s,t])(e[s,t]). We want to highlight that maintaining 𝒢(e[s,t]) and (e[s,t]) as a symbolic representation of the 4Δ-Coverage can take total time in Θ(n|𝔖e|) (Figure 7) motivating our techniques.

Theorem 24.

At the beginning and end of the loop in Lines 715 of Maintain of Algorithm 1, the set of index-pairs 𝒢~ coincides with 𝒢~(e[s,t]), coincides with (e[s,t])(e[s,t]) and 𝒰 coincides with 𝒰(e[s,t]). Further, executing Maintain takes 𝒪(nlogn) time.

Corollary 25.

Let 𝔰𝔖e be a sweep-sequence. There are m=𝒪(n) (not necessarily distinct) index-pairs p1=(i1,j1),,pm=(im,jm) together with m contiguous subsets Ii={(sai,tai),,(sbi,tbi)}𝔰 such that for every (s,t)𝔰 it holds that

𝒢~(e[s,t])((e[s,t])(e[s,t]))=1im,(s,t)Ii{pi}.

Further for pk=(ik,jk) the index ik is either always good or always bad for e[s,t] for (s,t)Ik, and the index jk is either always good or always bad for e[s,t] for (s,t)Ik. The index-pair pi as well as the values ai and bi can be computed for all i in total time 𝒪(nlogn).

Overall, we can maintain a symbolic representation of Cov^P() during a sweep-sequence in total time 𝒪(nlogn) which enables the use of the maintained sets for batch point-queries.

Algorithm 1 Maintenance of the reduced global group.
Algorithm 2 Advancing the start/end point of the sweeping window.
Theorem 26.

Let 𝔰𝔖e, and let Q[0,1] together with wQ:Q be a weighted point-set. For any QQ let wQ(Q)=qQwq(q) denote its weight. There is an algorithm which computes wQ(QCov^AS(e[s,t])) for every (s,t)𝔰 in 𝒪(|Q|log|Q|+nlogn) time.

Proof.

Define following values for all i, j along the sweep-sequence 𝔰 where for (s,t)𝔰:

Li((s,t))=qQ,q in cell il^i,e[s,t](s),l^i,e[s,t](s)qwQ(q),
Rj((s,t))=qQ,q in cell jr^j,e[s,t](t),qr^j,e[s,t](t)wQ(q),
Mi((s,t))=qQ,i(e[s,t]),q in cell ili(s),ri(t),li(s)qri(t)wQ(q),
D(i,j)=qQ,q in cell mimjwQ(q).

The value Li((s,t)) corresponds to the weight of points in cell i that lie right of l^i,e[s,t](s). Similarly, Ri((s,t)) corresponds to the weight of points in cell i that lie left of r^i,e[s,t](t). The value Mi((s,t)) corresponds to the weight of points in cell i that lie in between l^i,e[s,t](s) and r^i,e[s,t](t) if i is good for e[s,t]. Lastly D(i,j) corresponds to the total weight of points that lie on edge i,,j. Then for any (s,t)𝔰 and any (i,i)(e[s,t])(e[s,t]) it holds that Mi((s,t))=wQ(Q[li(s),ri(t)]). Similarly, for any (i,j)𝒢~(e[s,t]) it holds that Li((s,t))+D(i+1,j1)+Rj((s,t))=wQ(Q[l^i,e[s,t](s),r^i,e[s,t](t)]). Then, by Lemma 20

wQ(QCov^AS(e[s,t]))=
=(i(e[s,t])(e[s,t])wQ(Q[li(s),ri(t)]))+((i,j)𝒢~(e[s,t])wQ(Q[l^i,e[s,t](s),r^j,e[s,t](t)]))
=(i(e[s,t])(e[s,t])Mi((s,t)))+((i,j)𝒢~(e[s,t])Li((s,t))+D(i+1,j1)+Rj((s,t))).

Observe that D(i,j) can be provided via a data-structure that first computes di=qQ,q in cell iwQ(q) for every i in total time 𝒪(|Q|logn) and stores them in a balanced binary tree as leaves, where every inner node stores the sum of the values of its children. For every ij the value D(i,j)=imjdm can then be returned in 𝒪(logn) time by identifying in 𝒪(logn) time all 𝒪(logn) maximal subtrees whose children lie in the interval [i,j] and then returning the sum of the stored values in the root of each maximal subtree.

Figure 8: Construction of the set Iq encoding when l^i,e[s,t](s)q from Proof of Theorem 26 via the sets Lq and Rq encoding when li(s)q and ri(s)q and the set Bi for all (s,t)𝔰.

Next we show that Li() (resp. Rj() and Mi()) can correctly be maintained when performing the sweep of 𝔰. To this end let

Iq={(s,t)𝔰q is in cell i and l^i,e[s,t](s),l^i,e[s,t]q}.

Refer to Figure 8. As the free space in every cell is convex, throughout 𝔰 the first indices are monotone and the y-coordinates of the left-most and right-most points are stored in (Ae), for every qQ the contiguous subsets

Lq ={(s,t)𝔰q is in cell i and li(s),li(s)q} and
Rq ={(s,t)𝔰q is in cell i and li(s),ri(s)q}

can be computed in 𝒪(logn) time. Similarly the set Bi={(s,t)𝔰i is bad for e[s,t]} is also a contiguous subset of 𝔰 and can be computed in 𝒪(logn) time. Let Qi denote all qQ that lie on the ith edge of P. Then for any qQi

Iq=(Lq(𝔰Bi))((𝔰Rq)Bi),

and thus Iq consists of 𝒪(1) contiguous disjoint subsets of 𝔰. Thus all sets Iq (represented as 𝒪(1) contiguous subsets of 𝔰) can be computed in time 𝒪(|Q|logn+nlogn). Further,

Li((s,t))=qQi𝟙qIqwQ(q),

and hence Li() can be maintained in total time 𝒪(|Q|+nlogn) after one initial computation of all Iq, by adding wQ(q) for qQi whenever (s,t) enters the 𝒪(1) contiguous disjoint subsets of Iq, and subtracting wQ(q) for qQi whenever (s,t) exits the 𝒪(1) contiguous disjoint subsets of Iq. Sorting the boundaries of all Iq preparing them for the maintenance of Li() takes 𝒪(|Q|log|Q|) time. Similarly Mi() and Rj() can be maintained.

Overall, the values Li(), Rj(), Mi() and D(i,j) are correctly maintained in total time 𝒪(|Q|log|Q|) time such that they can be evaluated in 𝒪(logn). By Corollary 25 there are only 𝒪(n) total updates to 𝒢~() and ()(). Hence, wQ() can be correctly maintained along 𝔰 by updating it whenever Li(), Rj(), Mi(), 𝒢~(), ()() or () change. Thus computing wQ(e[s,t]) for all (s,t)S takes 𝒪(|Q|log|Q|+nlogn) time.

6 Ground-set discretization and sampling

We now present two (𝒪(logn),4)-approximation algorithm for SC that uses the efficient batch point queries from Theorem 26.

Definition 27 (Atomic intervals).

Let P be a polygonal curve, and let Δ>0 and be given. Let S be a simplification of P. Let G be the set of all intersection-points of horizontal lines at height h for y-coordinates h(S,P) of extremal points of 𝒟4Δ(S,P) with the boundary of the free space 𝒟4Δ(S,P). From this, define the set of atomic intervals 𝒜(S,P) as the set of intervals describing the arrangement of [0,1] defined by the set {x[0,1]y[0,1]:(x,y)G} (Figure 9).

Figure 9: Illustration of atomic intervals 𝒜(e,P) induced by (e,P).
Observation 28.

As |(S,P)|8n2, and each horizontal line intersects at most n cells, and the free space in every cell is convex it follows, the set of all midpoints of atomic intervals 𝒜(S,P) is a point set A[0,1] of size 16n3 such that for any C𝒞S(P) it holds that

Cov^P(C)=[0,1]ACov^P(C).
Lemma 29.

Let P be a polygonal curve of complexity n and let Δ>0 and n be given. Let S be a simplification of P. Let (S,P) be the extremal points of 𝒟4Δ(S,P). Let A[0,1] be a given set of size at most cn3 points. For every edge e of S let We be the set of atomic intervals a in 𝒜(e,P) such that wA(a)=|Aa|0. Let kΔ be the size of the smallest set 𝒞𝕏d such that CovP(𝒞,Δ)=[0,1]. There is an algorithm that computes a set 𝒞𝒞S(P) of size 16(3ln(n)+lnc+1)kΔ such that ACov^AS(𝒞) in time

𝒪(n2log2n+|A|logn+kΔn2log3n+logne|We|).

Proof Sketch.

The algorithm operates in rounds. In every round we start with a subset A of A, where initially A=A, and in every round for the partial solution R of the greedy algorithm it holds that A=ACov^P(R). Further in every round and for every edge e of S we maintain the set We of atomic intervals a in 𝒜(e,P) such that wA(a)=|Aa|0. These weights in combination with Theorem 26 allow us to identify the Type (II)- or (III)-subedge r of S maximizing |ACov^P(r)|=|(ACov^P(R))Cov^P(r)|. We similarly identify the Type (I)-curve r maximizing |ACov^P(r)| by initially computing Cov^P(r) for every Type (I)-curve and then in every round recomputing |ACov^P(r)|. Afterwards we add the subcurve maximizing |ACov^P(r)| to R. Lastly we update AACov^P(r). This results in almost all intervals in We whose weights are updated, to have their weights set to zero and are thus removed from We. The number of updates setting such a weight to zero, together with the initial application of Theorem 26 dominate the running time. Lastly, by standard greedy SetCover arguments [14, 22, 18] and Theorem 23 the number of rounds until the algorithm terminates (A=) is bounded by 16(ln(|A|)+1).

Figure 10: Illustration to Proof Sketch of Lemma 29. Modification of non-zero weights when adding r to R. All weights that are updated but not necessarily set to 0 are highlighted with a .
Corollary 30.

There is an (48ln(n)+64,4)-approximation algorithm for SC. Given a polygonal curve P of complexity n, Δ>0 and n, its running time is in 𝒪(n3log3n).

Proof.

Lemma 29 together with Observation 28 and Theorem 23 yields the claim.

The running time of Corollary 30 is dominated by the size of 𝒜(S,P). We now present techniques enabling identification of a representative subset of 𝒜(S,P).

Theorem 31.

Let n lists Li be given, each containing mi sorted values such that all values are distinct and in every list Li and for every jmi identifying the item at position j takes 𝒪(logmi) time and also for given v determining the maximal index j such that the jth item is less than v takes 𝒪(logmi) time. Then for every Kimi in 𝒪(Knlogmaximi) time one can determine K values v1<<vK such that

  1. 1.

    for every xiLi there is an i such that x[vi,vi+1] and

  2. 2.

    |[vi,vi+1]iLi|=O(imiK) for all i.

Proof Sketch.

We recursively compute a value m such that both a constant fraction of values among all lists is smaller than m and a constant fraction of values among all lists is bigger than m. To find m we compute the middle element of every list in 𝒪(nlogmaximi) time. Then m is (roughly) the middle element of these middle elements.

Lemma 32.

For every α[0,3] in 𝒪(n1+αlogn) time one can determine 𝒪(nα) so called α-coarse intervals partitioning [0,1], each containing at most 𝒪(n3α) intervals of 𝒜(S,P).

Proof Sketch.

By the structure of 𝒟4Δ(e,P) we can compute how many atomic intervals of 𝒜(e,P) are contained on any edge of P in 𝒪(logn) time. Similarly, on any edge of P we can compute the ith atomic interval boundary via 𝒪(1) binary searches of (e,P). Hence after 𝒪(nlogn) precomputation time per edge e of S, we can output the ith interval boundary of 𝒜(e,P) in 𝒪(logn) time for any i. Hence Theorem 31 implies the claim.

We now use Lemma 32 together with Lemma 29 to obtain the following lemma and theorem. The overarching approach for the algorithm is as follows: First use Lemma 32 to compute roughly 𝒪(n3/2) points from 𝒜(S,P). These points we cover with a set R1 of 𝒪~(kΔ) elements from 𝒞S(P) in time 𝒪~(kΔn5/2) via Lemma 29. Between any two consecutive points of the 𝒪(n3/2) points there are at most 𝒪(n3/2) other points from 𝒜(S,P), and at most 𝒪~(kΔn) of these sets are not covered by R1. This leaves us with 𝒪~(kΔn5/2) uncovered points in 𝒜(S,P). We again invoke Lemma 29, covering these remaining points with a set R2 of 𝒪~(kΔ) elements from 𝒞S(P) in time 𝒪~(kΔn5/2).

Lemma 33.

Let α[0,3] and K>0 be given. Let C𝒞S(P) be a set covering all midpoints of α-coarse intervals of size 𝒪(Klogn). Then there are 𝒪(Kn4αlogn) atomic intervals in 𝒜(S,P) and 𝒪(Kn4αlogn+Kn2logn) atomic intervals in e𝒜(e,P) that are not contained in Cov^AS(C). They can be computed in 𝒪(Kn4αlog2n+Kn2log2n) time.

Theorem 4. [Restated, see original statement.]

There is a (96ln(n)+128,4)-approximation for SC. Given a polygonal curve P of complexity n, Δ>0 and n, its running time is in 𝒪((n2+kΔn52)log2n), where kΔ is the size of the smallest subset C𝕏d such that CovP(C,Δ)=[0,1].

Proof Sketch.

The algorithm maintains an internal guess K of kΔ, which initially is 1. We describe a subroutine that decides, whether K<kΔ or outputs a solution of size 𝒪(Klogn). If the subroutine outputs that K<kΔ we set K2K, which then implies the claim.

First set α=32+logK2logn+loglognlogn. Next compute a set of α-coarse intervals via Lemma 32. From them compute their midpoints M and for every edge e of S compute the subset We of 𝒜(e,P) containing such a midpoint. Then We is the set of atomic intervals in 𝒜(e,P) with wM()0. The algorithm then attempts to cover the mid-points of these intervals via 𝒪(Klogn) rounds of the algorithm from Lemma 29. If the algorithm does not terminate after 𝒪(Klogn) rounds we return that K<kΔ. Otherwise we compute the set of atomic intervals A𝒜(S,P) and for every e the set of atomic intervals in We𝒜(e,P) that are not fully contained in the solution returned from the first call of the algorithm from Lemma 29 via Lemma 33. We then invoke the algorithm from Lemma 29 on the midpoints M of A and We which is precisely the set of atomic intervals in 𝒜(e,P) with wM()0. As K𝒪(kΔ) and thus K𝒪(n), the running time of the subroutine is

𝒪(n1+αlogn+Kn4αlog3n+Kn2log2n)=𝒪(K12n52log2n).

And thus overall the running time of the algorithm is

𝒪(n2log2n+K=1log(kΔ)((2K)12n52log2n))=𝒪(n2log2n+kΔn52log2n).

7 Subtrajectory Coverage Maximization

The techniques discussed in this paper can also be used to obtain the following result.

Theorem 5. [Restated, see original statement.]

Let ε(0,15]. There is a (e116e,4+ε)-approximation algorithm for SCM, where e is the base of the natural logarithm. Given a polygonal curve P of complexity n, Δ>0, n and k>0, its running time is in 𝒪((k+)n2ε2log2nlogε1).

Proof Sketch.

We first compute an ε-approximation of the 4Δ-free space, which allows describing Cov^P() as a sum of linear functions. A sum of linear functions is a linear function itself. As we can maintain a combinatorial description of the proxy coverage we are able to parsimoniously maintain this linear function during a scan of a sweep-sequence in 𝔖e, which allows evaluating Cov^P() for all elements in a single sweep-sequence in time roughly 𝒪(nlogn). Using this subroutine once per round and sweep-sequence results in a running time of 𝒪(kn2log2n).

References

  • [1] Pankaj K. Agarwal, Kyle Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, and Erin Taylor. Subtrajectory clustering: Models and algorithms. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018, PODS ’18, pages 75–87, 2018. doi:10.1145/3196959.3196972.
  • [2] Helmut Alt and Michael Godau. Computing the fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl., 5:75–91, 1995. doi:10.1142/S0218195995000064.
  • [3] Frederik Brüning, Hugo A. Akitaya, Erin W. Chambers, and Anne Driemel. Subtrajectory clustering: Finding set covers for set systems of subcurves. Comput. Geom. Topol., 2(1):1:1–1:48, February 2023. doi:10.57717/cgt.v2i1.7.
  • [4] Frederik Brüning, Jacobus Conradi, and Anne Driemel. Faster approximate covering of subcurves under the fréchet distance. In 30th Annual European Symposium on Algorithms (ESA 2022), volume 244 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1–28:16, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ESA.2022.28.
  • [5] Kevin Buchin, Maike Buchin, David Duran, Brittany Terese Fasy, Roel Jacobs, Vera Sacristan, Rodrigo I. Silveira, Frank Staals, and Carola Wenk. Clustering trajectories for map construction. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’17, 2017. doi:10.1145/3139958.3139964.
  • [6] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Jorren Hendriks, Erfan Hosseini Sereshgi, Vera Sacristán, Rodrigo I. Silveira, Jorrick Sleijster, Frank Staals, and Carola Wenk. Improved map construction using subtrajectory clustering. In LocalRec’20: Proceedings of the 4th ACM SIGSPATIAL Workshop on Location-Based Recommendations, Geosocial Networks, and Geoadvertising, LocalRec@SIGSPATIAL 2020, November 3, 2020, Seattle, WA, USA, pages 5:1–5:4, 2020. doi:10.1145/3423334.3431451.
  • [7] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Maarten Löffler, and Jun Luo. Detecting commuting patterns by clustering subtrajectories. International Journal of Computational Geometry and Applications, 21(3):253–282, 2011. doi:10.1142/S0218195911003652.
  • [8] Maike Buchin, Bernhard Kilgus, and Andrea Kölzsch. Group diagrams for representing trajectories. International Journal of Geographical Information Science, 34(12):2401–2433, 2020. doi:10.1080/13658816.2019.1684498.
  • [9] Jacobus Conradi and Anne Driemel. Finding complex patterns in trajectory data via geometric set cover, 2023. doi:10.48550/arXiv.2308.14865.
  • [10] Mark de Berg, Atlas F. Cook IV, and Joachim Gudmundsson. Fast fréchet queries. Computational Geometry, 46(6):747–755, 2013. doi:10.1016/j.comgeo.2012.11.006.
  • [11] Joachim Gudmundsson and Nacho Valladares. A GPU approach to subtrajectory clustering using the fréchet distance. IEEE Trans. Parallel Distributed Syst., 26(4):924–937, 2015. doi:10.1109/TPDS.2014.2317713.
  • [12] Joachim Gudmundsson and Sampson Wong. Cubic upper and lower bounds for subtrajectory clustering under the continuous fréchet distance, 2021. doi:10.48550/arXiv.2110.15554.
  • [13] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013. doi:10.1109/TPAMI.2013.248.
  • [14] David S Johnson. Approximation algorithms for combinatorial problems. In Proceedings of the fifth annual ACM symposium on Theory of computing, pages 38–49, 1973. doi:10.1145/800125.804034.
  • [15] Andreas Krause and Daniel Golovin. Submodular function maximization. In Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 71–104. Cambridge University Press, 2014. doi:10.1017/CBO9781139177801.004.
  • [16] Lily Lee and W Eric L Grimson. Gait analysis for recognition and classification. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pages 155–162. IEEE, 2002. doi:10.1109/AFGR.2002.1004148.
  • [17] Anqi Liang, Bin Yao, Bo Wang, Yinpei Liu, Zhida Chen, Jiong Xie, and Feifei Li. Sub-trajectory clustering with deep reinforcement learning. VLDB J., 33(3):685–702, 2024. doi:10.1007/s00778-023-00833-w.
  • [18] László Lovász. On the ratio of optimal integral and fractional covers. Discrete mathematics, 13(4):383–390, 1975. doi:10.1016/0012-365X(75)90058-8.
  • [19] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Mathematical Programming, 14(1):265–294, 1978. doi:10.1007/BF01588971.
  • [20] Sen Qiao, Y. Wang, and J. Li. Real-time human gesture grading based on OpenPose. 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pages 1–6, 2017. doi:10.1109/CISP-BMEI.2017.8301910.
  • [21] Roniel S. De Sousa, Azzedine Boukerche, and Antonio A. F. Loureiro. Vehicle trajectory similarity: Models, methods, and applications. ACM Comput. Surv., 53(5), September 2020. doi:10.1145/3406096.
  • [22] Sherman K Stein. Two combinatorial covering theorems. Journal of Combinatorial Theory, Series A, 16(3):391–397, 1974. doi:10.1016/0097-3165(74)90062-4.
  • [23] Ivor van der Hoog, Thijs van der Horst, and Tim Ophelders. Faster and deterministic subtrajectory clustering, 2024. doi:10.48550/arXiv.2402.13117.
  • [24] Erik van Sebille, Stephen M. Griffies, Ryan Abernathey, Thomas P. Adams, Pavel Berloff, Arne Biastoch, Bruno Blanke, Eric P. Chassignet, Yu Cheng, Colin J. Cotter, Eric Deleersnijder, Kristofer Döös, Henri F. Drake, Sybren Drijfhout, Stefan F. Gary, Arnold W. Heemink, Joakim Kjellsson, Inga Monika Koszalka, Michael Lange, Camille Lique, Graeme A. MacGilchrist, Robert Marsh, C. Gabriela Mayorga Adame, Ronan McAdam, Francesco Nencioli, Claire B. Paris, Matthew D. Piggott, Jeff A. Polton, Siren Rühs, Syed H.A.M. Shah, Matthew D. Thomas, Jinbo Wang, Phillip J. Wolfram, Laure Zanna, and Jan D. Zika. Lagrangian ocean analysis: Fundamentals and practices. Ocean Modelling, 121:49–75, 2018. doi:10.1016/j.ocemod.2017.11.008.
  • [25] Yiqun Xie, Shashi Shekhar, and Yan Li. Statistically-robust clustering techniques for mapping spatial hotspots: A survey. ACM Comput. Surv., 55(2), January 2022. doi:10.1145/3487893.