Abstract 1 Introduction 2 Streaming 𝜹-simplification 3 Streaming algorithm for the 𝒌-simplification problem References

Simplification of Trajectory Streams

Siu-Wing Cheng ORCID HKUST, Hong Kong, China Haoqiang Huang ORCID HKUST, Hong Kong, China Le Jiang ORCID USTC, Hefei, China
Abstract

While there are software systems that simplify trajectory streams on the fly, few curve simplification algorithms with quality guarantees fit the streaming requirements. We present streaming algorithms for two such problems under the Fréchet distance dF in d for some constant d2.

Consider a polygonal curve τ in d in a stream. We present a streaming algorithm that, for any ε(0,1) and δ>0, produces a curve σ such that dF(σ,τ[v1,vi])(1+ε)δ and |σ|2opt2, where τ[v1,vi] is the prefix in the stream so far, and opt=min{|σ|:dF(σ,τ[v1,vi])δ}. Let α=2(d1)d/22+d. The working storage is O(εα). Each vertex is processed in O(εαlog1ε) time for d{2,3} and O(εα) time for d4 . Thus, the whole τ can be simplified in O(εα|τ|log1ε) time. Ignoring polynomial factors in 1/ε, this running time is a factor |τ| faster than the best static algorithm that offers the same guarantees.

We present another streaming algorithm that, for any integer k2 and any ε(0,117), maintains a curve σ such that |σ|2k2 and dF(σ,τ[v1,vi])(1+ε)min{dF(σ,τ[v1,vi]):|σ|k}, where τ[v1,vi] is the prefix in the stream so far. The working storage is O((kε1+ε(α+1))log1ε). Each vertex is processed in O(kε(α+1)log21ε) time for d{2,3} and O(kε(α+1)log1ε) time for d4.

Keywords and phrases:
streaming algorithm, curve simplification, Fréchet distance
Copyright and License:
[Uncaptioned image] © Siu-Wing Cheng, Haoqiang Huang, and Le Jiang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Computational geometry
Related Version:
Full Version: http://arxiv.org/abs/2503.23025
Funding:
This research is supported by the Research Grants Council, Hong Kong, China (project no. 16208923).
Editors:
Oswin Aichholzer and Haitao Wang

1 Introduction

The pervasive use of GPS sensors has enabled tracking of moving objects. For example, a car fleet can use real-time information about its vehicles to deploy them in response to dynamic demands. The sensor periodically samples the location of the moving object and sends it as a vertex to the remote cloud server for storage and processing. The remote server interprets the sequence of vertices received as a polygonal curve (trajectory).

For a massive stream, it has been reported [6, 7, 18, 19] that sending all vertices uses too much network bandwidth and storage at the server, and it may aggravate issues like out-of-order and duplicate data points. Software systems have been built for the sensor side to simplify trajectory streams on the fly (e.g. [16, 17, 18, 27, 28]). A local buffer is used. Every incoming vertex in the stream triggers a new round of computation that uses only the incoming vertex and the data in the local buffer. At the end of a round, these systems may determine the next vertex and send it to the cloud server, but they may also send nothing.

It is important to keep the local buffer small [16, 17, 18, 27, 28]. It means a small working storage and less computation in processing the next vertex in the stream, which enables real-time response and requires less computing power. In theoretical computer science, streaming algorithms have three goals [22]. Let n be the number of items in the stream so far. First, the working storage should be o(n) if not polylog(n). Second, an item in the stream should be processed in polylog(n) time or at worst o(n) time because items are arriving continuously. Third, although the best solution is unattainable as the entire input is not processed as a whole, the solution should be provably satisfactory. Let τ be the curve in the stream. The abovementioned software systems only deal with some local error measures between the simplified curve σ and τ. We want to provide guarantees on the size of σ and the global similarity between σ and τ.

A popular similarity measure is the Fréchet distance [14]. Let ρτ:[0,1]d be a parameterization of τ such that, as t increases from 0 to 1, ρτ(t) moves from the beginning of τ to its end without backtracking. It is possible that ρτ(t)=ρτ(t) for two distinct t,t[0,1]. We can similarly define a parameterization ρσ for σ. These parameterizations induce a matching between ρσ(t) and ρτ(t) for all t[0,1]. A point may have multiple matching partners. The Fréchet distance is dF(σ,τ)=infmaxt[0,1]d(ρσ(t),ρτ(t)). We call a matching that realizes the Fréchet distance a Fréchet matching.

We study two problems. The δ-simplification problem is to compute, for a given δ>0, a curve σ of the minimum size (number of vertices) such that dF(σ,τ)δ. The k-simplification problem is to compute, for a given integer k2, a curve σ of size k that minimizes dF(σ,τ).

Given a polygonal curve τ=(v1,v2,), |τ| denotes its number of vertices, and for any i<j, τ[vi,vj] denotes the subcurve (vi,vi+1,,vj). For a subset Pd, d(x,P)=infpPd(x,p). Given two points x,yd, xy denotes the oriented line segment from x to y.

1.1 Related works

Static 𝜹-simplification.

Let n=|τ|. In , if the vertices of σ are restricted to lie on τ (anywhere), then |σ| can be minimized in O(n) time [24]; if the vertices of σ must be a subset of those of τ, it takes O(n) time to compute σ such that |σ| is at most two more than the minimum [10]. In , there is a data structure [25] that, for any given δ, reports a curve σ with the minimum |σ| such that the vertices of σ lie on τ (anywhere) and dF(σ,τ)δ. The query time is O(|σ|), and the preprocessing time is O(n).

In 2, a curve σ with minimum |σ| such that dF(σ,τ)δ can be computed in O(n2log2n) time [15]. In d for d3, if the vertices of σ must be a subset of those of τ, then |σ| can be minimized in O(n3) time [5, 24, 26]. If there is no restriction on the vertices of σ, it has been recently shown that |σ| can be minimized in O((|σ|n)O(d|σ|)) time [9].

Faster algorithms have been developed by allowing inexact output. Let κ(τ,r) be the minimum simplified curve size for an error r. In 2, it was shown [3] that for any δ>0, a curve σ can computed in O(nlogn) time such that dF(σ,τ)δ and |σ|κ(τ,δ/2). It is possible that κ(τ,δ/2)κ(τ,δ). A better control of the curve size has been obtained later. In d, it was shown [24] that for any ε(0,1) and δ>0, a curve σ can be constructed in O(ε22dn2lognloglogn) time such that dF(σ,τ)(1+ε)δ and |σ|2κ(τ,δ)2.111In [24], the first and last vertices of σ are restricted to be the same as those of τ. Later, it was shown [8] that for any α,ε(0,1) and δ>0, a curve σ can be computed in O~(nO(1/α)(d/(αε))O(d/α)) time such that dF(σ,τ)(1+ε)δ and |σ|(1+α)κ(τ,δ).

The algorithms above for d2 do not fit the streaming setting [3, 5, 8, 9, 15, 24, 26]. In [3], the simplified curve σ is a subsequence of the vertices in τ, and the vertices of σ are picked from τ in a traversal of τ. Given the last pick vi, for any j>i, whether vj should be included in σ is determined by examining the subcurve τ[vi,vj]. This requires an O(n) working storage which is too large. If the algorithms in [5, 8, 9, 15, 24, 26] are used in the streaming setting, the processing time per vertex would be O(n) or more which is too high.

Static 𝒌-simplification.

In , for any given k, the data structure for δ-simplification in [25] can be queried in O(k) time to report a curve σ such that |σ|=k, the vertices of σ lie on τ (anywhere), and dF(σ,τ) is minimized. In d, if the vertices of σ must be a subset of those of τ, it was shown [14] that a solution σ can be computed in O(n4logn) time such that dF(σ,τ)7min{dF(σ,τ):|σ|k,no restriction on the vertices of σ}. The factor 7 can be reduced to 4 by a better analysis [3]. Nevertheless, if this algorithm is used in the streaming setting, the vertex processing time would be O(n3logn) or more, which is way too high.

Streaming line simplification.

A line simplification σ of a stream τ is a subsequence of the vertices such that the first and last vertices of σ are the first and last vertices in the stream so far. In 2, for any integer k2 and any ε(0,1), it was shown [1, 2] how to maintain σ such that |σ|2k and dF(σ,τ)(42+ε)optk, where optk=min{dF(σ,τ):line simplification σ of τ,|σ|k}. The working storage is O(k2+kε1/2). Each vertex is processed in O(klog1ε) amortized time. There are also streaming software systems for this problem that minimize some local error measures (e.g. [13, 20, 21, 23]); however, they do not provide any guarantee on the global similarity between τ and the output curve.

Discrete Fréchet distance.

The discrete Fréchet distance ddF(σ,τ) is obtained with the restriction that the parameterizations ρσ and ρτ must match the vertices of σ to those of τ and vice versa. Note that ddF(σ,τ)dF(σ,τ) and ddF(σ,τ)dF(σ,τ) in the worst case.

There are several results in 3 [4]. A curve σ with the minimum |σ| such that ddF(σ,τ)δ can be computed in O(nlogn) time; if the vertices of σ must be a subset of those of τ, the computation time increases to O(n2). On the other hand, if k is given instead of δ, a curve σ such that |σ|=k and ddF(σ,τ) is minimized can be computed in O(knlognlog(n/k)) time; if the vertices of σ must a subset of those of τ, the computation time increases to O(n3).

In d, a streaming k-simplification algorithm was proposed in [11] with guarantees on ddF(σ,τ), where τ is the curve seen in the stream so far. It was shown that for any integer k2 and any ε(0,1), a curve σ can be computed such that |σ|k and ddF(σ,τ)8min{ddF(σ,τ):|σ|k}. The working storage is O(kd).

Improved results have been obtained subsequently [12]. For the streaming δ-simplification problem, there is a streaming algorithm in d that, for any γ>1, computes a curve σ such that ddF(σ,τ)δ and |σ|κ(τ,δ/γ). This algorithm relies on a streaming algorithm for maintaining a γ-approximate minimum closing ball (MEB) of points in d; the processing time per vertex is asymptotically the same as the γ-approximate MEB streaming algorithm. There are two streaming k-simplification algorithms in [12]. The first one uses O(1εkdlog1ε)+O(ε)(d+1)/2log21ε working storage and maintains a curve σ such that |σ|k and ddF(σ,τ)(1+ε)min{ddF(σ,τ):|σ|k}. The second algorithm uses O(1εkdlog1ε) working storage and maintains a curve σ such that |σ|k and ddF(σ,τ)(1.22+ε)min{ddF(σ,τ):|σ|k}.

1.2 Our results

We present streaming algorithms for the δ-simplification and k-simplification problems in d for d2 under the Fréchet distance. Let τ=(v1,v2,) be the input stream.

Streaming 𝜹-simplification.

Our algorithm outputs vertices occasionally and maintains some vertices in the working storage. The simplified curve σ for the prefix τ[v1,vi] in the stream so far consists of vertices that have been output and vertices in the working storage. Vertices that have been output cannot be modified, so they form a prefix of all simplified curves to be produced in the future for the rest of the stream.

For any ε(0,1) and any δ>0, our algorithm produces a curve σ for the prefix τ[v1,vi] in the stream so far such that dF(σ,τ[v1,vi])(1+ε)δ and |σ|2κ(τ[v1,vi],δ)2. Let α=2(d1)d/22+d. The working storage is O(εα). Each vertex in the stream is processed in O(εαlog1ε) time for d{2,3} and O(εα) time for d4. If our algorithm is used in the static case, the running time on a curve of size n is O(εαnlog1ε). Ignoring polynomial factors in 1/ε, our time bound is a factor n smaller than the O(ε22dn2lognloglogn) running time of the best static algorithm that achieves the same error and size bounds [24].

Intuitively, our streaming algorithm finds a line segment that stabs as many balls as possible that are centered at consecutive vertices of τ with radii (1+O(ε))δ. If such a line segment ceases to exist upon the arrival of a new vertex, we start a new line segment. Concatenating these segments forms the simplified curve. For efficiency, we approximate each ball by covering it with grid cells of O(εδ) width. In [24], vertex balls and their covers by grid cells of O(εδ) width are also used. Connections between all pairs of balls are computed to create a graph such that the simplified curve corresponds to a shortest path in the graph. In contrast, we maintain some geometric structures so that we can read off the next line segment from them. We prove an O(εα) bound on the total size of these structures, which yields the desired working storage and processing time per vertex.

Streaming 𝒌-simplification.

A memory budget may cap the size of the simplified curve, motivating the streaming k-simplification problem. For example, in some wildlife tracking applications, the location-acquisition device is not readily accessible after deployment, and data is only offloaded from it after an extended period [19].

Our algorithm maintains the simplified curve σ in the working storage for the prefix τ[v1,vi] in the stream so far. For any k2 and any ε(0,117), our algorithm guarantees that |σ|2k2 and dF(σ,τ[v1,vi])(1+ε)min{dF(σ,τ[v1,vi]):|σ|k}. Recall that α=2(d1)d/22+d. The working storage is O((kε1+ε(α+1))log1ε). Each vertex in the stream is processed in O(kε(α+1)log21ε) time for d{2,3} and O(kε(α+1)log1ε) time for d4. We first simplify as in the case of δ-simplification. When the simplified curve reaches a size of 2k1, the key idea is to run our δ-simplification algorithm with a suitable error tolerance to bring its size down to 2k2.

There is only one prior streaming k-simplification algorithm [1, 2]. It works in 2. It restricts the vertices of the simplified curve σ to be a subset of those of τ, whereas our algorithm does not. It uses O(k2+kε1/2) working storage, processes each vertex in O(klog1ε) amortized time, and guarantees that |σ|2k and dF(σ,τ)(42+ε)optimum under the requirement that the vertices of σ must be a subset of those of τ. For d=2, our algorithm uses O((kε1+ε5)log1ε) working storage, processes each vertex in O(kε5log21ε) worst-case time, and guarantees that |σ|2k2 and dF(σ,τ)(1+ε)optimum. Ignoring the restriction on the vertices of σ, we offer a better approximation ratio for dF(σ,τ). Although our vertex processing time bound is larger, it is worst-case instead of amortized. The comparison between working storage depends on the relative magnitudes of k and 1/ε.

2 Streaming 𝜹-simplification

Given a subset Xd, we use conv(X) to denote the convex hull of X. For any point xd, let Bx denote the d-ball centered at x with radius δ. Consider the infinite d-dimenisonal grid with x as a grid vertex and side length εδ/(2d). Let Gx be the subset of grid cells that intersect the ball centered at x with radius (1+ε/2)δ. We say that an oriented line segment s stabs the objects O1,,Om in order if there exist points xisOi for i[m] such that x1,,xm appear in this order along s [15].

2.1 Algorithm

The high-level idea is to find the longest sequence v1,v2,,vi of vertices such that there is a segment s1 that stabs Bva for a[i] in order. Then, restart to find the longest sequence vi+1,,vj such that there is a segment s2 that stabs Bva for a[i+1,j] in order. Repeating this way gives a sequence of segments s1,s2,. Concatenating these segments gives the simplified curve. The problem is the large working storage: a long vertex sequence may be needed to determine a line segment. We use several ideas to overcome this problem.

First, we approximate Bva by conv(Gva) and find a line segment that stabs the longest sequence conv(Gv1),,conv(Gvi) in order. We will show that it suffices for this line segment to start from the set P of grid points in Gv1. So |P|=O(εd).

Second, as the vertices arrive in the stream, for each point pP, we construct a structure Sa[p] inductively as follows:

  • Set S1[p]={p} for all pP. We write S1[p]=p for convenience.

  • For all a1, set Sa+1[p]=conv(Gva+1)F(Sa[p],p) for all pP, where F(Sa[p],p)={yd:pySa[p]}.

If pSa[p], then F(Sa[p],p)=d and hence Sa+1[p]=conv(Gva+1). If Sa[p]=, then F(Sa[p],p)= and hence Sa+1[p]=. Otherwise, by viewing p as a light source, F(Sa[p],p) is the unbounded convex polytope consisting of Sa[p] and the non-illuminated subset of d.

We will show that for a2, Sa[p] is equal to {xconv(Gva):px stabs conv(Gv1), , conv(Gva) in order}. We will also show that for each pP, Sa[p] and F(Sa[p],p) have poly(1/ε) complexities, and they can be constructed in poly(1/ε) time. The array Sa is deleted after computing Sa+1, which keeps the working storage small.

We pause when we encounter a vertex vi+1 such that Si+1[p]= for all pP. We choose any vertex q of any non-empty Si[p] (which must exist) and output the segment pq. Since pq stabs conv(Gv1),,conv(Gvi) in order, we can show that dF(pq,τ[v1,vi])(1+ε)δ. All is well except that Si has been deleted after computing Si+1. A fix is that after processing a vertex va, we store a segment pq in a buffer σ~ for an arbitrary vertex q of an arbitrary non-empty Sa[p]. We output the content of σ~ after discovering that Si+1[p]= for all pP.

Afterward, we reset P as the set of grid points in Gvi+1 and restart the above processing from vi+1. That is, reset Si+1[p]=p for all pP. One of the points in P will be the start of the next segment that will be output in the future. The start of this next segment may be far from conv(Gvi), which contains the end of σ~ that was just output. For a=i+1,i+2,, we set Sa+1[p]=conv(Gva+1)F(Sa[p],p) for all pP until Sa[p] becomes empty for all pP again. In summary, the simplified curve σ is the concatenation of the output segments s1,s2,, i.e., the end of sj is connected to the start of sj+1. Figure 1 illustrates a few steps of this algorithm. The procedure Simplify(ε,δ) in Algorithm 1 gives the details.

Figure 1: The shaded balls denote the conv(Gva)’s. The vertices arrive in order (from left to right and top to bottom). The first P consists of the grid points inside the blue dashed ball. Let p1 be a grid point in P. F(S2[p1],p1) is the green unbounded convex region that is bounded by the tangents from p1 to S2[p1]. S3[p1]=conv(Gv3)F(S2[p1],p1) is denoted by the dashed clipped ball around v3. Similarly, F(S3[p1],p1) is the red unbounded convex region that is bounded by the tangents from p1 to S3[p1]. S4[p1] is denoted by the dashed clipped ball around v4. In this example, S5[p]= for all pP, so we output σ~ which may be equal to (p1,q) for a vertex q of S4[p1]. We reset P as the set of grid points in Gv5 (inside the red dashed ball) and reset S5[p]=p for all pP.
Algorithm 1 Simplify.

Input: A stream τ=(v1,v2,), ε, and δ.

Output: The output curve σ consists of the sequence of vertices output so far and the vertices in the buffer σ~. After processing the last vertex vi at the end of the stream, Simpify returns (P,Si), which will be useful for the k-simplification problem.

2.2 Analysis

Given a polytope Qd, let |Q| denote its complexity, i.e., the total number of its faces of all dimensions.

Lemma 1.

Let P be the set of grid points in Gvi. Assume that Si[p]=p for all pP. Take any pP and any index ji+1 such that Sa[p] for all a[i,j1].

  1. (i)

    Sj[p]={xconv(Gvj):px stabs conv(Gvi),,conv(Gvj) in order}.

  2. (ii)

    Sj[p] is a convex polytope that can be computed in O(|Sj1[p]|+NlogN+Nd/2) time, where N is any upper bound on the number of support hyperplanes of F(Sj1[p],p) such that N=Ω(ε(1d)d/2).

Lemma 2.

Let τ[v1,vi] be the prefix in the stream so far. The simplified curve σ satisfies the properties that dF(σ,τ[v1,vi])(1+ε)δ and |σ|2κ(τ[v1,vi],δ)2.

Theorem 3.

Let τ be a polygonal curve in d that arrives in a data stream. Let α=2(d1)d/22+d. For every δ>0 and every ε(0,1), the output curve σ of Simplify(ε,δ) satisfies dF(σ,τ)(1+ε)δ and |σ|2κ(τ,δ)2. The working storage is O(εα). Each vertex is processed in O(εαlog1ε) time for d{2,3} and O(εα) time for d4.

Proof.

The guarantees on σ follow from Lemma 2. We will bound the number of support hyperplanes and complexities of Sj[p] and F(Sj1[p],p). The working storage and processing time then follow from Lemma 1(ii) and the fact that |P|=O(εd).

A bounding halfspace of a convex polytope O is a halfspace that contains O and is bounded by the support hyperplane of a facet of O. We count the bounding halfspaces of Sj[p]. Some are bounding halfspaces of F(Sj1[p],p) whose boundaries pass through p and are tangent to Sj1[p]. We call them the pivot bounding halfspaces. The remaining ones are bounding halfspaces of conv(Gva) for some aj. We call these non-pivot bounding halfspaces.

Since conv(Gva)’s are translates of each other, their facets share a set V of unit outward normals. We have |V||conv(Gva)|=O(ε(1d)d/2). No two facets of Sj[p] have the same vector in V because the two corresponding bounding halfspaces would be identical or nested – neither is possible. This feature helps us to prevent a cascading growth in the complexities of Sj[p] in the inductive construction.

Take a pivot bounding halfspace h of Sj[p]. Suppose that h was introduced for the first time as a bounding halfspace of Si[p] for some ij. The boundary of h, denoted by h, passes through p and is tangent to Si1[p] at a (d2)-dimensional face f of Si1[p]. Let g1 and g2 be the bounding halfspaces of Si1[p] whose boundaries contain the two facets of Si1[p] incident to f. We make three observations. First, F(Si1[p],p) lies inside the cone Ci of rays that shoot from p towards Si1[p]. Second, h is a bounding halfspace of Ci. Third, Sa[p]Ci for all a[i,j].

We claim that neither g1 nor g2 passes through p. Neither g1 nor g2 is equal to h because h was not introduced before Si[p]. If both g1 and g2 pass through p, then f spans a (d2)-dimensional affine subspace that passes through p. But then h meets Ci at only, which is a contradiction because h should support a facet of Sj[p]. If either g1 or g2 passes through p, say g2, then the affine subspace spanned by f does not pass through p. But then p and f span a unique hyperplane, giving the contradiction that g2=h.

By our claim, g1 and g2 are non-pivot bounding halfspaces of Si1[p]. We charge h to (ϕ1,ϕ2) or (ϕ2,ϕ1), where ϕ1 and ϕ2 are the outward unit normals of g1 and g2, respectively. The pair (ϕ1,ϕ2) is charged if a ray that shoots from p to an interior point of g1g2 arbitrarily near g1g2 hits g1 before g2. Otherwise, the pair (ϕ2,ϕ1) is charged. Suppose that h is charged to (ϕ1,ϕ2). We argue that (ϕ1,ϕ2) is not charged again at some (d2)-dimensional face of Sa[p] for all a[i,j1] due to another pivot bounding halfspace of Sj[p].

Assume to the contrary that (ϕ1,ϕ2) is charged again at some (d2)-dimensional face of Sb[p] for some b[i,j1] due to a pivot bounding halfspace h of Sj[p]. Either the dimension of hCi is zero or one, or the dimension of hCi is two. In the first case, h cannot support any facet of Sj[p], a contradiction. In the second case, since h is charged to the same ordered pair (ϕ1,ϕ2), h must separate Sb[p] from h. Therefore, the cone of rays Cb that shoot from p towards Sb[p] meets h only at p. However, Sj[p]Cb, which means that h cannot support any facet of Sj[p], a contradiction.

In all, Sj[p] has O(ε(1d)d/2) non-pivot bounding halfspaces and at most |V|(|V|1)=O(ε2(1d)d/2) pivot bounding halfspaces. The same bounds hold for Sj1[p]. If a support hyperplane L of F(Sj1[p],p) is not a support hyperplane of Sj1[p], then L passes through p and is tangent to Sj1[p] at some (d2)-dimensional face f. Since L is different from the support hyperplanes of the two facets of Sj1[p] incident to f, these two hyperplanes do not pass through p. So the two facets incident to f induce some non-pivot bounding halfspaces of Sj1[p], implying that F(Sj1[p],p) has O(ε2(1d)d/2) bounding halfspaces. It follows that |Sj[p]| and |F(Sj1[p],p)| are O(ε2(1d)d/22).

3 Streaming algorithm for the 𝒌-simplification problem

3.1 Algorithm

Let σi be the optimal solution for the k-simplification problem for τ[v1,vi]. If we knew δi=dF(σi,τ[v1,vi]), we could call Simplify(ε,δi) to obtain an approximate solution for the k-simplification problem. Therefore, we try to estimate δi on the fly.

We maintain an output simplified curve σi in the working storage after processing every vertex vi in the stream. The first simplified curve σ2k1 is obtained by calling Simplify, via an invocation of a new procedure Compress, on τ[v1,v2k1] with δ equal to some value δ2k1. Given a curve of size 2k1, Compress simplifies it to a curve of size at most 2k2.

For i2k, when a vertex vi arrives in the stream, we either update the last edge of σi1 to produce σi as in Simplify, or start a new segment (initially just the vertex vi) as in Simplify. Suppose that we start new segment. If |σi1|<2k2, we can append vi to σi1 to form σi. Otherwise, σi is obtained by calling Simplify, via an invocation of Compress, on σi1(vi) with δ equal to some value δi to be specified later. We will prove in Lemmas 48 that δi(1+O(ε))δi. Afterward, we repeat the above to process the next vertex in the stream. The above processing is elaborated in Reduce in Algorithm 2.

We define a curve τi for every i2k1 in the comments in lines 6, 11, and 22. These curves facilitate the analysis, but they are not maintained by the algorithm as variables. For each i, τi is the curve from which the simplification σi is computed. Therefore, τ2k1=τ[v1,v2k1] (line 6). Consider any i2k. If we call Compress to simplify σi1(vi) to a curve σi of size at most 2k2, we set τi=σi1(vi) (line 22). Otherwise, σi is obtained by simplifying τi1(vi) whether we start a new segment at vi or not, so τi=τi1(vi) (line 11).

The input parameter r of Reduce needs an explanation. We will show how to compute a lower bound δmin>0 for dF(σ2k1,τ[v1,v2k1]). For i2k1, let ri1 and si0 be integers such that

1εsiδmin(1+ε)ri1εsiδminδi(1+ε)riεsiδmin1εsi+1δmin.

We will show that if we can start the processing of the data stream with an initial error tolerance of δ2k2=ε(1+ε)ri+1(14ε)1δmin, then the output curve σi will be within a Fréchet distance of εsi(1+ε)ri+1(14ε)1δmin from τ[v1,vi]. By the definitions of ri and si, the error is at most (1+O(ε))δi. The correct values of ri and si are unknown to us. Nevertheless, no matter what si is, we have 1rilog1+ε1ε. We launch log1+ε1ε=O(1εlog1ε) independent runs of Reduce for each r between 1 and log1+ε1ε. After processing the arriving vertex vi, the run with the minimum δi gives the right answer. This idea of using multiple independent runs to capture the right δ was proposed in [12] for streaming k-simplification under the discrete Fréchet distance. We still have to obtain si. This is achieved by establishing a lower bound for δi (Lemma 5) and computing a particular upper bound for δi in line 2 of Compress.

We assume that no three consecutive vertices in the stream are collinear. We enforce it using O(1) more working storage as follows. Remember the last two vertices (x,y) in the stream, but do not feed y to Reduce yet. Wait until the next vertex z arrives. If x, y and z are not colinear, feed y to Reduce and remember (y,z) as the last two vertices. If x, y and z are collinear, then drop y, make (x,z) the last two vertices, and wait for the next vertex.

Algorithm 3 shows the procedure Compress. If Compress is called when processing vi, its task is to simplify τi=σi1(vi) to a curve σi of size at most 2k2 with error estimate δi=εtδi1, where t is some judiciously chosen positive integer.

Algorithm 2 Reduce.

Input: A data stream τ=(v1,v2,) and three parameters r, ε, and k.
Output: Let τ[v1,vi] be the prefix in the stream so far. A curve σi is maintained such that |σi|2k2 and dF(σi,τ[v1,vi])(1+ε)δi.

Algorithm 3 Compress.

Input: A polygonal curve (x1,,x2k1) and three parameters ε, k, and δ.
Output: (ζ,εtδ,P,S), where ζ is the output curve of calling Simplify(ε,εtδ) on (x1,,x2k1) for some t, and (P,S) are the output returned by Simplify after processing x2k1.

3.2 Analysis

Lemma 4 below gives a lower bound for the Fréchet distance between a curve of size m and another curve of size n such that m2n1. An application of this result shows that δmin computed in line 4 of Reduce is a lower bound for dF(σ2k1,τ[v1,v2k1]) as |σ2k1|=k.

Lemma 4.

Let ξ=(p1,,pm) and ζ=(q1,,qn) be any two curves such that m2n1. Then, dF(ξ,ζ)12min{d(pi,pi1pi+1):i[2,m1]}.

Lemma 5 below shows that a curve (p1,,p2k1) can be simplified to a curve of size at most 2k2 using an error tolerance of 12min{d(pj,pj1pj+1):j[2,2k2],j is even}. By Lemma 5, Compress always returns a curve of size at most 2k2.

Lemma 5.

Let ξ=(p1,,p2k1). Assume no three consecutive vertices are collinear. Let δ^ be the minimum value such that calling Simplify(ε,δ^) on ξ produces a curve of size at most 2k2. Then,

δ^12min{d(pj,pj1pj+1):j[2,2k2],j is even}(1+ε)δ^.

Proof.

Consider the run of Simplify(ε,δ^) on ξ. Let pi1,pi2 be the vertices in order along ξ at which Simplify starts a new segment. Note that i1=1 and ij+1ij+2 for all j.

Simplify replaces each subcurve ξ[pij,pij+11] by a segment. The simplified curve is a concatenation of these segments. Since |ξ|=2k1, there must be an index j such that ij+1>ij+2 so that Simplify(ε,δ^) simplifies ξ to a curve of size at most 2k2. Let b=min{j:ij+1>ij+2}. Because i1=1 and ij+1=ij+2 for all j[b1], we know that ij is odd for every j[b] and ib+1 is even. It follows from the choice of b that Simplify does not start a new segment at pib+2. By the working of Simplify(ε,δ^), there is a grid point p in Gpib (defined with δ=δ^) such that Sib+2[p]. By the reasoning in the proof of Lemma 2, for any point xSib+2[p], dF(px,ξ[pib,pib+2])(1+ε)δ^. On the other hand, by Lemma 4, dF(px,ξ[pib,pib+2])12d(pib+1,pibpib+2). Hence, (1+ε)δ^12d(pib+1,pibpib+2)12min{d(pj,pj1pj+1):j[2,2k2],j is even}.

It remains to argue that δ^12min{d(pj,pj1pj+1):j[2,2k2],j is even}. Let δ=12min{d(pj,pj1pj+1):j[2,2k2],j is even}. It suffices to show that calling Simplify(ε,δ) on ξ returns a curve of size at most 2k2. Let pa1,pa2, be the vertices in order along ξ at which Simplify(ε,δ) starts a new segment. Let c be the even number in [2,2k2] such that d(pc,pc1pc+1)=min{d(pj,pj1pj+1):j[2,2k2],j is even}.

Suppose that there is an index j that satisfies ajc1 and aj+1>aj+2. Since ξ[paj,paj+11] is simplified to a line segment, the prefix ξ[p1,paj+11] is simplified to a curve of size at most aj+12, i.e., at least one vertex less. We are done because the size of the whole simplified curve must be at most 2k2.

The remaining possibility is that aj+1=aj+2 for all aj[c1]. Since a1=1 and c is even, we have a1=1,a2=3,a3=5,,aj=c1. It follows that Simplify starts a new segment at pc1. Let r=d(pc,pc1pc+1).

We claim that there is a line segment xy such that dF(xy,ξ[pc1,pc+1])r/2. Let z be the nearest point in pc1pc+1 to pc. So zpc has length r. Let z be the midpoint of zpc. Translate pc1pc+1 by the vector zz to obtain a segment xy. Observe that zxy and d(x,pc1)=d(y,pc+1)=dF(z,pc)=r/2. By linear interpolations between xz and pc1pc and between zy and pcpc+1, we have dF(xy,ξ[pc1,pc+1])r/2, establishing our claim.

By the choice of c, 12r=12d(pc,pc1cc+1)=δ. Then, our claim implies that xy stabs balls of radii δ centered at pc1,pc,pc+1 in order. There is a grid point p in Gpc1 (defined using δ) within a distance εδ/2 from x. By a linear interpolation between py and xy, we know that py stabs conv(Gpc1),conv(Gpc),conv(Gpc+1) in order. Therefore, Sc+1[p] by Lemma 1(i), implying that Simplify does not start a new segment at pc+1. Hence, Simplify replaces a subcurve ξ[pc1,pe] for some ec+1 by a segment and produces a curve of size at most 2k2. This proves that δδ^.

Recall the curves τi for i2k1 defined in the comments in lines 6, 11, and 22. The next result follows immediately from the working of Reduce.

Lemma 6.

For i2k1, σi computed by Reduce can also be produced by calling Simplify(ε,δi) on τi.

Lemma 7.

Assume that ε(0,1/3]. For i2k1, dF(τi,τ[v1,vi])2εδi.

Recall that σi is the curve of size k at minimum Fréchet distance from τ[v1,vi] and that δi=dF(σi,τ[v1,vi]). Also, recall that for i2k1, ri1 and si0 are integers that satisfy the following inequalities:

1εsiδmin(1+ε)ri1εsiδminδi(1+ε)riεsiδmin1εsi+1δmin. (1)

By Lemma 4, δminδ2k1. Also, δaδb for all a<b because a Fréchet matching between σb and τ[v1,vb] also matches τ[v1,va] to a curve of size at most k. Therefore, δminδi for i2k1, which implies that ri and si are well defined for i2k1. Note that ri is an integer between 1 and log1+ε(1/ε).

Lemma 8.

For all i2k1 and ε(0,117), Reduce(ri,ε,k) computes δi(1+8ε)δi.

Proof.

We prove that Reduce(ri,ε,k) computes δiεsi1+ε)ri+1(14ε)1δmin. By (1), this is at most (1+ε)2(14ε)1δi(1+8ε)δi for ε1/17.

Consider the case of i=2k1. We examine the call Reduce(r2k1,ε,k). By (1), εs2k1(1+ε)r2k11δminδ2k1εs2k1(1+ε)r2k1δmin. Theorem 3 and Lemma 5 imply that εs2k1(1+ε)r2k11δminδ2k112min{d(vj,vj1vj+1):j[2,2k2],j is even}(1+ε)δ2k1εs2k1(1+ε)r2k1+1δmin. Since the input parameter r of Reduce is equal to r2k1, line 5 of Reduce sets δ2k2=ε(1+ε)r2k1+1(14ε)1δmin. For ε1/17, we have ε1>(1+ε)2(14ε)1, and so ε1s2k1(1+ε)r2k1+1(14ε)1δmin<εs2k1(1+ε)r2k11δmin12min{d(vj,vj1vj+1):j[2,2k2],j is even}. As a result, line 2 of Compress ensures that δ2k1=εs2k1(1+ε)r2k1+1(14ε)1δmin.

Consider an index i>2k1. We examine the call Reduce(ri,ε,k). Define:

Δ=εsi(1+ε)ri+1δmin. (2)

Our goal is to prove that δi(14ε)1Δ. We rewrite the middle inequalities in (1) as

(1+ε)2Δδi(1+ε)1Δ. (3)

Define:

a=max{j[2k2,i]:δjε(14ε)1Δ}. (4)

The index a is well defined because one can verify that δ2k2=ε(1+ε)ri+1(14ε)1δminε(14ε)1Δ.

By (4), (2), line 5 of Reduce, and line 2 of Compress, δa=εh(14ε)1Δ for some integer h1. If a=i, then δi=δaε(14ε)1Δ, so we are done. Suppose that a<i. Note that δa+1δa by (4), which implies that δa+1>δa. So τa+1=σa(va+1), and Compress(τa+1,ε,k,δa) is called to produce σa+1 and δa+1. We have

dF(σa+1,τa+1) dF(σa+1,τ[v1,va+1])+dF(τa+1,τ[v1,va+1]) (5)
= δa+1+dF(τa+1,τ[v1,va+1])δi+dF(τa+1,τ[v1,va+1]).

If a+1=2k1, then τa+1=τ2k1=τ[v1,v2k1], so dF(τa+1,τ[v1,va+1])=0. Suppose that a+1>2k1. By Lemma 6 and Theorem 3, dF(σa,τa)(1+ε)δa. By Lemma 7, dF(τa,τ[v1,va])2εδa. So dF(τa+1,τ[v1,va+1])dF(σa,τ[v1,va])dF(σa,τa)+dF(τa,τ[v1,va])(1+3ε)δa. Since δaε(14ε)1Δ, we conclude that

dF(τa+1,τ[v1,va+1])ε(1+3ε)(14ε)1Δ. (6)

Plugging (6) into (5) and using (3), we obtain the following inequality for ε<1/17:

dF(σa+1,τa+1)(11+ε+ε+3ε214ε)ΔΔ(1+ε)(14ε). (7)

Consider the call of Simplify in Compress(τa+1,ε,k,δa). By (7) and Theorem 3, calling Simplify on τa+1 with an error tolerance of 1(1+ε)(14ε)Δ will produce a curve of size at most 2k2. Then, by Lemma 5, 12min{d(vj,vj1vj+1):j[2,2k2],j is even}(1+ε)1(1+ε)(14ε)Δ=(14ε)1Δ. Line 2 of Compress ensures that δa+1=εtδa=εht(14ε)1Δ for the smallest integer t1 such that δa+112min{d(vj,vj1vj+1):j[2,2k2],j is even}. Hence, t cannot be greater than h. But t cannot be less than h because δa+1>ε(14ε)1Δ by (4). So δa+1=(14ε)1Δ. For every b[a+2,i],

dF(σb,τa+1τ[va+2,vb]) dF(σb,τ[v1,vb])+dF(τa+1τ[va+2,vb],τ[v1,vb])
δb+dF(τa+1,τ[v1,va+1])
δi+ε(1+3ε)(14ε)1Δ((6))
< (14ε)1Δ=δa+1.((3))

Then, when processing va+2, Lemma 6 and Theorem 3 imply that Reduce executes lines 1020 with δa+2=δa+1=(14ε)1Δ to produce σa+2 of size at most 2k2. Applying this reasoning inductively gives δi=(14ε)1Δ.

Theorem 9.

Let τ be a polygonal curve in d that arrives in a data stream. Let α=2(d1)d/22+d. There is a streaming algorithm that, for any integer k2 and any ε(0,117), maintains a curve σ such that |σ|2k2 and dF(τ,σ)(1+ε)min{dF(τ,σ):|σ|k}. It uses O((kε1+ε(α+1))log1ε) working storage and processes each vertex of τ in O(kε(α+1)log21ε) time for d{2,3} and O(kε(α+1)log1ε) time for d4.

Proof.

Let δ=min{dF(τ,σ):|σ|k}. By Lemma 8, if we launch log1+ε/10(10/ε)=O(1εlog1ε) runs of Reduce using ε/10 instead of ε, there is a run that outputs a curve σ such that dF(τ,σ)(1+ε/10)(1+8ε/10)δ(1+ε)δ. Then, the working storage and processing time per vertex follow from Theorem 3.

References

  • [1] M.A. Abam, M. de Berg, P. Hachenberger, and A. Zarei. Streaming algorithms for line simplification. In Proceedings of the Annual Symposium on Computational Geometry, pages 175–183, 2007.
  • [2] M.A. Abam, M. de Berg, P. Hachenberger, and A. Zarei. Streaming algorithms for line simplification. Discrete and Computational Geometry, 43:497–515, 2010. doi:10.1007/S00454-008-9132-4.
  • [3] P.K. Agarwal, S. Har-Peled, N.H. Mustafa, and Y. Wang. Near-linear time approximation algorithms for curve simplification. Algorithmica, 42(3):203–219, 2005. doi:10.1007/S00453-005-1165-Y.
  • [4] S. Bereg, M. Jiang, W. Wang, B. Yang, and B. Zhu. Simplifying 3D polygonal chains under the discrete Fréchet distance. In Proceedings of the Latin American Symposium on Theoretical Informatics, pages 630–641, 2008.
  • [5] K. Bringmann and B.R. Chaudhury. Polyline simplification has cubic complexity. In Proceedings of the International Symposium on Computational Geometry, pages 18:1–18:16, 2019.
  • [6] H. Cao, O. Wolfson, and G. Trajcevski. Spatio-temporal data reduction with deterministic error bounds. The VLDB Journal, 15(3):211–228, 2006. doi:10.1007/S00778-005-0163-7.
  • [7] M. Chen, M. Xu, and P. Fränti. A fast o(n) multiresolution polygonal approximation algorithm for gps trajectory simplification. IEEE Transactions on Image Processing, 21(5):2770–2785, 2012.
  • [8] S.-W. Cheng and H. Huang. Curve simplification and clustering under Fréchet distance. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 1414–1432, 2023.
  • [9] S.-W. Cheng and H. Huang. Solving Fréchet distance problems by algebraic geometric methods. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 4502–4513, 2024.
  • [10] A. Driemel, A. Krivošija, and C. Sohler. Clustering time series under the Fréchet distance. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 766–785, 2016.
  • [11] A. Driemel, I. Psarros, and M. Schmidt. Sublinear data structures for short Fréchet queries, 2019. arXiv:1907.04420. arXiv:1907.04420.
  • [12] A. Filtser and O. Filtser. Static and streaming data structures for Fréchet distance queries. ACM Transactions on Algorithms, 19(4):39:1–39:36, 2023. doi:10.1145/3610227.
  • [13] Y. Gao, Z. Fang, J. Xu, S. Gong, C. Shen, and L. Chen. An efficient and distributed framework for real-time trajectory stream clustering. IEEE Transactions on Knowledge and Data Engineering, 36(5):1857–1873, 2024. doi:10.1109/TKDE.2023.3312319.
  • [14] M. Godau. A natural metric for curves - computing the distance for polygonal chains and approximation algorithms. In Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science, pages 127–136, 1991.
  • [15] L.J. Guibas, J.E. Hershberger, J.S.B. Mitchell, and J.S. Snoeyink. Approximating polygons and subdivisions with minimum-link paths. International Journal of Computational Geometry & Applications, 3(4):383–415, 1993. doi:10.1142/S0218195993000257.
  • [16] X. Lin, J. Jiang, S. Ma, Y. Zuo, and C. Hu. One-pass trajectory simplification using the synchronous Euclidean distance. The VLDB Journal, 28:897–921, 2018.
  • [17] X. Lin, S. Ma, J. Jiang, Y. Hou, and T. Wo. Error bounded line simplification algorithms for trajectory compression: an experimental evaluation. ACM Transactions on Database Systems, 46(3):11:1–11:44, 2021. doi:10.1145/3474373.
  • [18] X. Lin, S. Ma, H. Zhang, T. Wo, and J. Huai. One-pass error bounded trajectory simplification. In Proceedings of the VLDB Endowment, volume 10, pages 841–852, 2017. doi:10.14778/3067421.3067432.
  • [19] J. Liu, K. Zhao, P. Sommer, S. Shang, B. Kusy, and R. Jurdak. Bounded quadrant system: error-bounded trajectory compression on the go. In Proceedings of the IEEE International Conference on Data Engineering, pages 987–998, 2015.
  • [20] J. Muckell, J.-H. Hwang, V. Patil, C.T. Lawson, F. Ping, and S. Ravi. SQUISH: an online approach for GPS trajectory compression. In Proceedings of the International Conference on Computing for Geospatial Research & Applications, pages 13:1–13:8, 2011.
  • [21] J. Muckell, P.W. Olsen, J.-H. Hwang, C.T. Lawson, and S. Ravi. Compression of trajectory data: a comprehensive evaluation and new approach. GeoInformatica, 18(3):435–460, 2014. doi:10.1007/S10707-013-0184-0.
  • [22] S. Muthukrishnan. Data Streams: Algorithms and Applications. now Publishers Inc., 2005.
  • [23] M. Potamias, K. Patroumpas, and T. Sellis. Sampling trajectory streams with spatiotemporal criteria. In Proceedings of the IEEE International Conference on Scientific and Statistical Database Management, pages 275–284, 2006.
  • [24] M. van de Kerkhof, I. Kostitsyna, M. Löffler, M. Mirzanezhad, and C. Wenk. Global curve simplification. In Proceedings of the European Symposium on Algorithms, pages 67:1–67:14, 2019.
  • [25] T. van der Horst and T. Ophelders. Computing minimum complexity 1D curve simplifications under the Fréchet distance. In The 39th European Workshop on Computational Geometry, 2023.
  • [26] M. van Kreveld, M. Löffler, and L. Wiratma. On optimal polyline simplification using the Hausdorff and Fréchet distance. In Proceedings of the International Symposium on Computational Geometry, pages 56:1–56:14, 2018.
  • [27] Z. Wang, C. Long, G. Cong, and Q. Zhang. Error-bounded online trajectory simplification with multi-agent reinforcement learning. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1758–1768, 2021.
  • [28] D. Zhang, M. Ding, D. Yang, Y. Liu, J. Fan, and H.T. Shen. Trajectory simplification: an experimental study and quality analysis. In Proceedings of the VLDB Endowment, volume 11, pages 934–946, 2018. doi:10.14778/3213880.3213885.