Abstract 1 Introduction 2 Definitions and preliminaries on trajectory sample databases 3 Syntax, semantics and evaluation of (𝓡,𝓣)-queries 4 The complexity of the (𝓡,𝓣)-realisability problem 5 Conclusion References

On the Complexity of the Realisability Problem for Visit Events in Trajectory Sample Databases

Arthur Jansen111Corresponding author ORCID Hasselt University, Databases and Theoretical Computer Science Group and
Data Science Institute (DSI), Agoralaan, Building D, 3590 Diepenbeek, Belgium
Bart Kuijpers ORCID Hasselt University, Databases and Theoretical Computer Science Group and
Data Science Institute (DSI), Agoralaan, Building D, 3590 Diepenbeek, Belgium
Abstract

Trajectory sample databases store finite sequences of measured space-time locations of moving objects, along with a speed bound for each object. These databases can be seen as uncertain databases. We propose a language that allows the formulation of queries about the uncertainty in trajectory sample databases. As part of that language, we introduce the notion of visit events, which are used to describe certain constraints on the movement of an object. In our language, an atomic query asks whether a moving object can, given its limitations, realise such an event. We give complexity results for this realisability problem, in various settings.

Keywords and phrases:
Trajectory sample databases, uncertain databases, query languages, complexity
Funding:
Arthur Jansen: Bijzonder Onderzoeksfonds (BOF22OWB06) from UHasselt
Copyright and License:
[Uncaptioned image] © Arthur Jansen and Bart Kuijpers; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Information systems Spatial-temporal systems
; Information systems Query languages
Editors:
Thierry Vidal and Przemysław Andrzej Wałęga

1 Introduction

Due to the proliferation of location-aware devices (such as GPS receivers) in the past two decades, one of the use-cases of moving object databases [8] is the storage of time-stamped measured locations of moving objects [7]. Such a sequence of spatio-temporal measurements of a single moving object is called a trajectory sample. Given this type of partial information on a moving object, we do not know the precise space-time path (or trajectory) which the object has followed, but we do know that the trajectory must have passed these measured spatio-temporal locations. However, without making further assumptions, there are no theoretical limits to the movement of the object in between two measurements. An assumption that originates from the area of time geography, where the moving object’s accessibility to an environment is studied, is that we know a bound on the speed of the moving object [5, 9, 13]. Therefore, it is common to associate a maximal speed to each moving object, alongside a trajectory sample. With this additional knowledge, the actual trajectory of a moving object is guaranteed to be contained in a spatio-temporal region known as a “lifeline necklace” in spatio-temporal and moving object databases [6, 10, 18], or simply as a “chain of space-time prisms” in the fields of time geography [9] and Geographical Information Systems (GIS) [15, 12, 14].

We refer to a moving object database with the specific purpose of storing trajectory samples and speed bounds as a trajectory sample database. A trajectory sample database can be seen as an uncertain (or incomplete) database [1, 11]. In general, an uncertain database represents a set of “possible worlds”, where each possible world is a concrete instantiation of the data. In our setting, a possible world corresponds to an assignment of a trajectory to every object, satisfying the known limitations of that object. In this paper, we propose a query language for trajectory sample databases that is based on events that are possibly semantically interesting for some application and that may occur during a trajectory of a moving object. We introduce visit events as part of our language, which are used to describe constraints on trajectories. The most basic visit event expresses that a trajectory visits a particular region during a particular period of time. To allow for composition, the class of visit events is closed under Boolean combinations (using negation, conjunction and disjunction). For example, we could express the complex event that states that an object has visited a museum in the morning, some restaurant at lunch and has not been in a particular church in the afternoon. For a dataset of tourists visiting some city, this event may have occurred during the movement of some of the tourists. This example shows that our proposed query language is related to the field of semantic trajectories and places of interest (POIs) in such trajectories and could be used to match trajectories with event patterns [4, 17, 21]. The queries appearing in [20], where a cylinder model of uncertainty is used, are similar to what our languages can express.

The atomic query in our language asks whether a given visit event is realisable by the trajectory of some moving object, that is, whether there exists a trajectory that satisfies the constraints described by that event. We call the evaluation of this atomic query the realisability problem. To allow for composition on the query level, the query expressions are also closed under Boolean combinations. An important detail is that our query language is defined relative to two parameters: , denoting the class of spatial regions that can occur in visit events, and 𝒯, denoting the periods of time that can occur in visit events. We study the complexity of the realisability problem for different choices of and 𝒯. For , we consider the class of singleton points (𝖯𝗈𝗂𝗇𝗍) and of semi-algebraic sets (𝖲𝖾𝗆𝗂𝖠𝗅𝗀). For 𝒯, on the other hand, we consider the class of singleton moments (𝖬𝗈𝗆𝖾𝗇𝗍) and intervals of time (𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅). Because the realisability problem already becomes NP-hard for very restrictive classes of events, and because we believe it is of interest to measure the influence that the size of the database has on the complexity of query evaluation, we distinguish between combined complexity, data complexity and query complexity [1]. A summary of our complexity results is displayed in Table 1.

Table 1: A summary of the complexity results: data, query, and combined complexity.
Class of events Data Query Combined
(𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-event in DNF linear polynomial polynomial
Positive (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-events linear NP-hard NP-hard
(𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-events linear NP-hard NP-hard
Conjunctive (𝖯𝗈𝗂𝗇𝗍,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-events polynomial NP-hard NP-hard
Positive (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-events polynomial NP-hard NP-hard

The paper is organised as follows. In Section 2, we give the definitions that are necessary to formalize the notion of a trajectory sample database. In Section 3, we define the syntax and the semantics of our query languages. In Section 4, we study the complexity of the realisability problem in different settings. Finally, we conclude the paper with Section 5.

2 Definitions and preliminaries on trajectory sample databases

In this section, we give definitions of the concepts needed to introduce the notion of a trajectory sample database.

We consider the space in which objects move to be the plane 2, and we use the letters p,q, (with or without indices) to denote locations in space. Time is modelled as the real line , and we use t (with or without indices) to refer to the temporal points (or moments). We also use x and y to refer to the real coordinates of spatial locations. A spatio-temporal point (p,t) is then an element of 2× and if p=(x,y), we also write (x,y,t) for the spatio-temporal point (p,t). Furthermore, we use d to denote the Euclidean distance in 2.

The movement of an object is captured by the notion of “trajectory” and it corresponds to a function mapping (all possible) moments in time to locations in space, as is expressed by the following definition.

Definition 1.

A trajectory is a continuous function from to 2.

We use the Greek letter γ (with or without indices) to refer to trajectories. In practice trajectories are only observed or measured at discrete moments in time and we call these partial views on trajectories “trajectory samples”.

Definition 2.

A trajectory sample (or sample, for short) is a finite sequence of space-time points (p1,t1),,(pn,tn) that is ordered by the temporal component (that is, t1<<tn).

We use the letter S (with or without indices) to refer to trajectory samples. The following definition captures the notion of trajectories matching trajectory samples.

Definition 3.

We say a trajectory γ visits a space-time point (p,t) when γ(t)=p and we say that a trajectory γ visits a subset of space-time if γ visits one of its elements.

A trajectory sample (p1,t1),,(pn,tn) matches the trajectory γ if γ visits all the space-time points (pi,ti), for i=1,,n.

Trajectory sample databases do not only store a trajectory sample for each moving object, but also a maximal speed for each moving object. Because we assume that every moving object has such speed bound, these bounds further restrict trajectories (as trajectory samples do). This is capured in the following definition.

Definition 4.

A trajectory γ is called vmax-bounded when for all t,t we have d(γ(t),γ(t))vmax|tt|.

We use Γ(S,vmax) to denote the set of all vmax-bounded trajectories matching the sample S. Obviously, some trajectory samples have no vmax-bounded trajectories that match them. The consistency of a sample with a speed bound vmax is expressed in the following definition.

Definition 5.

A trajectory sample (p1,t1),,(pn,tn) is called vmax-consistent when d(pi,pi+1)vmax(ti+1ti), for i=1,,n1.

In the remainder of this paper, we assume, when given a trajectory sample S and a speed bound vmax, that S is vmax-consistent.

Definition 6.

The linear interpolation trajectory of a trajectory sample S=(p1,t1),,(pn,tn), denoted by 𝖫𝖨𝖳(S), is defined as the trajectory γ, with

γ(t)={p1if tt1ti+1tti+1tipi+ttiti+1tipi+1if ti<tti+1  (for 1i<n)pnif tn<t.

We note that if S is vmax-consistent, then 𝖫𝖨𝖳(S) is a vmax-bounded trajectory.

In trajectory sample databases, we use natural numbers as identifiers for moving objects. Therefore, such database is defined relative to a finite subset 𝖮𝖻𝗃 of , called the object identifiers.

Definition 7.

A trajectory sample database D over object identifiers 𝖮𝖻𝗃 is a function mapping each identifier i𝖮𝖻𝗃 to a pair D(i)=(SD(i),vD(i)), where SD(i) is a trajectory sample and vD(i) is a speed bound.

This means that, given a database D, the set ΓD(i)=Γ(SD(i),vD(i)) contains all trajectories that the object with identifier i may have followed (given the sample and the speed bound).

3 Syntax, semantics and evaluation of (𝓡,𝓣)-queries

In this section, we describe a family of query languages for trajectory sample databases. Our languages consists of two “tiers”: in the inner tier, we have the events, which are used as subexpressions in the outer tier, where we have the query expressions. We define the language relative to two parameters, and 𝒯, where is a collection of spatial regions (or subsets of 2), and 𝒯 is a collection of temporal periods (or subsets of ).

In the following subsections, we define the syntax of the query languages, their semantics, and we end with a remark on the evaluation of query expressions.

3.1 The syntax of (𝓡,𝓣)-queries

We start by defining the inner tier of our language, which is a calculus of events. These events occur as subexpressions in the query expressions defined later on. Events express visits that may occur during the movement of an object.

Definition 8.

We define the (,𝒯)-events as follows:

  1. 1.

    𝗏𝗂𝗌𝗂𝗍𝗌(R,T), with R and T𝒯, is an atomic (,𝒯)-event;

  2. 2.

    If e,e1 and e2 are (,𝒯)-events, then so are (¬e), (e1e2) and (e1e2).

In other words, (,𝒯)-events can be considered as propositional formulas over the propositional symbols 𝗏𝗂𝗌𝗂𝗍𝗌(R,T), for every R and T𝒯. Now, we are ready to define (,𝒯)-queries. Their expression is given in the following definition.

Definition 9.

Let 𝖵𝖺𝗋 be a set of object identifier variables (or variables, for short). The (,𝒯)-query expressions are defined as follows:

  1. 1.

    If e is an (,𝒯)-event, i a natural number and x𝖵𝖺𝗋, then 𝗋𝖾𝖺𝗅𝗂𝗌𝖺𝖻𝗅𝖾(i,e) and 𝗋𝖾𝖺𝗅𝗂𝗌𝖺𝖻𝗅𝖾(x,e) are atomic (,𝒯)-query expressions;

  2. 2.

    If q,q1 and q2 are (,𝒯)-query expression, then so are (¬q), (q1q2) and (q1q2).

3.2 The semantics of (𝓡,𝓣)-queries

To define the semantics of query expressions, we first define what it means for an event to be realised by a trajectory.

Definition 10.

Let e be an (,𝒯)-event and let γ be a trajectory.

  1. 1.

    If e is the atomic event 𝗏𝗂𝗌𝗂𝗍𝗌(R,T), then e is realised by γ if γ(t)R for some tT (that is, γ visits R×T).

  2. 2.

    If e is of the form (¬e1), then e is realised by γ when e1 is not realised by γ.

  3. 3.

    If e is of the form (e1e2), then e is realised by γ when e1 and e2 are realised by γ.

  4. 4.

    If e is of the form (e1e2), then e is realised by γ when e1 or e2 is realised by γ.

In our definition of the semantics of (,𝒯)-queries, we distinguish between Boolean and non-Boolean queries. The first type of queries contain no variables and they evaluate to true or false. The second type of queries contain variables and they define a relations over 𝖮𝖻𝗃.

Definition 11.

Let D be a trajectory sample database over object identifiers 𝖮𝖻𝗃. We write Dq to express that a variable-free (,𝒯)-query expression q evaluates to true on D. This relation is defined as follows:

  1. 1.

    D𝗋𝖾𝖺𝗅𝗂𝗌𝖺𝖻𝗅𝖾(i,e) if i𝖮𝖻𝗃 and there exists a trajectory in ΓD(i) that realises e.

  2. 2.

    D¬q if Dq does not hold.

  3. 3.

    Dq1q2 if Dq1 and Dq2.

  4. 4.

    Dq1q2 if Dq1 or Dq2.

Definition 12.

Let D be a trajectory sample database over object identifiers 𝖮𝖻𝗃. If q is an (,𝒯)-query expression containing variables x1,,xk𝖵𝖺𝗋, then the result of q evaluated in D is

q(D)={(i1,,ik)𝖮𝖻𝗃DkDq[i1/x1,,ik/xk]},

where q[i1/x1,,ik/xk] is obtained from q by instantiating the variable xj in q by ij, for j=1,,k.

3.3 Evaluation of (𝓡,𝓣)-queries

Having defined the semantics of our (,𝒯)-query languages, there is a “standard” way of evaluating query expressions with variables: given a query expression with k variables, we enumerate all k-tuples of object identifiers, consider all the instantiations associated with them and then evaluate the variable-free expression obtained by substituting the variable occurrences by the concrete object identifiers from this tuple.

However, it is not immediately clear from the definition of the semantics how an atomic query of the form 𝗋𝖾𝖺𝗅𝗂𝗌𝖺𝖻𝗅𝖾(i,e) can be evaluated. In what follows, we restrict our attention to this decision problem.

Definition 13.

We define the (,𝒯)-realisability problem to be the following decision problem: given (,𝒯)-event e, sample S and speed bound v, does there exist a trajectory in Γ(S,v) that realises e?

From now on, we use the notation (S,v)e to express that there exists a trajectory in Γ(S,v) that realises the event e.

4 The complexity of the (𝓡,𝓣)-realisability problem

In this section, we give a number of complexity results on the above realisability problem. We recall that the input to this problem is a trajectory sample, a speed bound and an (,𝒯)-event. In order for the realisability problem to be a proper computational decision problem, these inputs must have finite representations. This means, for example, that we cannot take arbitrary real numbers as input and that we need to assume some finite encoding of our inputs. Here, we assume that the spatio-temporal points occurring in the trajectory sample have rational coordinates, and that the speed bound is rational. As for the (,𝒯)-event, its representation depends on the choice for and 𝒯 and the representation of their elements. The choices for that we consider are 𝖯𝗈𝗂𝗇𝗍, the collection of all singletons in 2; and 𝖲𝖾𝗆𝗂𝖠𝗅𝗀, the collection of all semi-algebraic sets in the plane. A semi-algebraic set in the plane is a subset of 2 that can be defined using a Boolean combination of polynomial (in)equalities over two real variables (where the polynomials have integer coefficients) [3]. Elements of 𝖯𝗈𝗂𝗇𝗍 are simply represented by the coordinates of the point in question, while an element of 𝖲𝖾𝗆𝗂𝖠𝗅𝗀 is represented by some encoding of its defining formula. The choices for 𝒯 we consider are 𝖬𝗈𝗆𝖾𝗇𝗍, containing all singletons in , and 𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅, containing all closed intervals of with rational endpoints.

For notational convenience, the atomic (𝖯𝗈𝗂𝗇𝗍,𝒯)-event 𝗏𝗂𝗌𝗂𝗍𝗌({p},T) will be written as 𝗏𝗂𝗌𝗂𝗍𝗌(p,T). And similarly, we will write 𝗏𝗂𝗌𝗂𝗍𝗌(R,t) for the atomic (,𝖬𝗈𝗆𝖾𝗇𝗍)-event 𝗏𝗂𝗌𝗂𝗍𝗌(R,{t}).

As we show below, the realisability problem is already NP-hard for quite restricted classes of events. Similar to the evaluation of queries in relational databases, the hardness of the realisability problem is caused by the size of the event, and not by the size of the trajectory sample. Therefore, we study the complexity of the realisability problem in three different settings [1], being:

  • the data complexity, where we measure the complexity in terms of the size of the sample, and consider the event to be fixed,

  • the query complexity, where we measure the complexity in terms of the size of event, and consider the sample to be fixed, and

  • the combined complexity, where we measure the complexity in terms of both the size of the sample, and the size of the event.

We also consider several restrictions to the class of input events, such as positive events, containing no negations, conjunctive events, being conjunctions of atoms (or, not containing disjunctions and negations), and events in disjunctive normal form (DNF).

In the remainder of this section, we give various results on the complexity of the (,𝒯)-realisability problem, for different choices of and 𝒯 and in the three different settings.

For the complexity results mentioned below, we use a computational model in which operations (addition, multiplication, …) and comparison relations (=, <, …) on rational numbers are assumed to take unit time. That is, we measure the time complexity in terms of the number of spatio-temporal points in a trajectory sample and in terms of number of atoms (and their length) in an event-expression.

4.1 The query complexity of the (𝓡,𝓣)-realisability problem

Our first result shows that the realisability problem is already NP-hard for a relatively restricted class of events, namely the positive (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-events.

Theorem 14.

In terms of query complexity, the (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem for positive events is NP-hard.

Proof.

To prove NP-hardness, we describe a reduction from SAT, the satisfiability problem of propositional formulas. Because the statement concerns query complexity, we reduce SAT to the (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem with fixed sample and speed bound. In this case, we choose the sample S=((0,0),0),((0,0),1) and the speed bound v=1. The input of the reduction is a propositional formula ϕ. We can assume that ϕ is in negation normal form333A formula is said to be in negation normal form if negation operators are only applied to atoms., because the satisfiability problem remains NP-hard under this restriction. The output is a positive (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-event eϕ such that the formula ϕ is satisfiable iff there exists a trajectory γ in Γ(S,v) that realises the event eϕ.

Let P1,Pk be the propositional symbols occuring in ϕ. For 1ik, we define ti=i+1k+2, pi=(12(k+2),0) and pi=(12(k+2),0). Now, we take eϕ to be the result of substituting occurences of ¬Pi by 𝗏𝗂𝗌𝗂𝗍𝗌(pi,ti) and non-negated occurrences of Pi by 𝗏𝗂𝗌𝗂𝗍𝗌(pi,ti) in ϕ. Clearly, eϕ is a (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-event, not containing negations.

Finally, we show that ϕ is satisfiable if and only if there exists a γ in Γ(S,v) that realises eϕ. The “if”-direction is straightforward. If there is some γ that realises eϕ, then ϕ must certainly be satisfiable. To be precise, the assignment that assigns Pi to true if and only if γ(ti)=pi satisfies ϕ. For the “only if”-direction, assume that ϕ is satisfiable. Then there exists a truth assignment α that makes ϕ true. Now, we extend the sample S to a sample S such that it contains (pi,ti) if Pi is assigned true by α, and otherwise contains (pi,ti). It is easily verified that S is v-consistent, which means 𝖫𝖨𝖳(S) is a v-bounded trajectory. Now, we have that 𝖫𝖨𝖳(S) realises 𝗏𝗂𝗌𝗂𝗍𝗌(pi,ti) if Pi is true under α, and realises 𝗏𝗂𝗌𝗂𝗍𝗌(pi,ti) if ¬Pi is true under α. It follows from the way we constructed the event, that 𝖫𝖨𝖳(S) realises eϕ.

While the above result shows that the (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem is NP-hard for positive events (not containing negations), the problem for (𝖯𝗈𝗂𝗇𝗍,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-events already becomes NP-hard for conjunctions of atoms, as shown below.

Theorem 15.

In terms of query complexity, the (𝖯𝗈𝗂𝗇𝗍,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-realisability problem for conjunctive events is NP-hard.

Proof.

We give a reduction from the Euclidean travelling saleman problem (E-TSP for short), shown to be NP-hard in [16] (the problem we refer to here is called the Euclidean tour-TSP there). The Euclidean travelling saleman problem asks, given a finite set of locations P2 and a positive number , whether there is a cycle through all locations of P whose length is at most . Formally, this means there is a permutation p1,,pn of the locations in P such that i=1n1d(pi,pi+1)+d(pn,p1). Without loss of generality, we assume that P always contains the origin (0,0).

Again, we work with a fixed sample S=((0,0),0) and speed bound v=1. From an instance P, of E-TSP, we give a conjunction of (𝖯𝗈𝗂𝗇𝗍,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-atoms C, such that (S,v)C if and only if there is cycle through P of length at most . We define C as

𝗏𝗂𝗌𝗂𝗍𝗌((0,0),[,])pP{(0,0)}𝗏𝗂𝗌𝗂𝗍𝗌(p,[0,]).

To prove that this reduction is correct, we first show that if (S,v)C, then there is cycle through P of length at most . Let γ be a v-bounded trajectory matching S and realising C. Because γ matches S, we have γ(0)=(0,0). And, because γ realises C, it reaches every location in P at some moment in [0,], and γ()=(0,0). This induces an order p1,,pn of the locations in P, where γ first reaches p1, then p2, and so on (we note that p1 is always (0,0)). Thus, if we let ti be the first moment where γ(ti)=pi for i=1,,n, then 0=t1<<tn. We claim the cycle p1,,pn,p1 has length at most . The length of this cycle is i=1n1d(pi,pi+1)+d(pn,p1). For every i, γ visits (pi,ti), and because γ is v-bounded, we have d(pi,pi+1)v|titi+1|=ti+1ti. Similarly, γ visits (pn,tn) and (p1,), thus d(pn,p1)tn. It follows that the length of the cycle is at most i=1n1(ti+1ti)+tn=tnt1+tn=.

Finally, we show that if there is cycle through P of length at most , then (S,v)C. Let p1,,pn,p1 be such cycle, where we choose p1 to be (0,0). Now, let t1=0 and for i=2,,n, take ti=ti1+d(pi1,pi). Then, tn=i=1n1d(pi,pi+1), which, by assumption, is at most d(pn,p1). Define the trajectory sample S=(p1,t1),,(pn,tn),(p1,). It is clear from the definition of ti that the part of S excluding (p1,) is v-consistent. We have seen that tnd(pn,p1), and thus d(pn,p1)tn, which implies that S is v-consistent. From this follows that 𝖫𝖨𝖳(S) is v-bounded, and it clearly matches S and realises C.

4.2 The data complexity of the (𝓡,𝓣)-realisability problem

In this section, we show that the (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem has linear-time data complexity, and the (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-realisability problem has polynomial-time data complexity

We start with the result on (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-queries, but first we introduce some notation and we give two lemmas. Every region in 𝖲𝖾𝗆𝗂𝖠𝗅𝗀 is of the form {(x,y)2φ(x,y)}, where φ=φ(x,y) is a quantifier-free formula over the vocabulary (+,×,<,0,1), with x and y as free variables. The set {(x,y)2φ(x,y)} is called the region defined by φ, and we denote it by R(φ).

Definition 16.

If C=𝗏𝗂𝗌𝗂𝗍𝗌(R1,t1)𝗏𝗂𝗌𝗂𝗍𝗌(Rk,tk) is a conjunction of atomic (,𝖬𝗈𝗆𝖾𝗇𝗍)-events, and I is an interval, the formula CI is the conjunction of those 𝗏𝗂𝗌𝗂𝗍𝗌(Ri,ti), with 1ik, for which tiI.

Lemma 17.

Let C be a conjunction of atomic (,𝖬𝗈𝗆𝖾𝗇𝗍)-events, let S be a trajectory sample (p1,t1),,(pn,tn) and let v be a speed bound. Then, (S,v)C if and only if all of the following are true:

  1. (1)

    ((p1,t1),v)C(,t1],

  2. (2)

    for i=1,,n1, we have ((pi,ti),(pi+1,ti+1),v)C[ti,ti+1], and

  3. (3)

    ((pn,tn),v)C[tn,+).

Proof.

The “only if”-direction is obvious. We prove the “if”-direction. Assume (1), (2) and (3) are true. By (1), there exists a v-bounded trajectory γ0 matching (p1,t1) that realises C(,t1]. By (2), for i=1,,n1, there exists a v-bounded trajectory γi matching (pi,ti),(pi+1,ti+1) that realises C[ti,ti+1]. And by (3), there exists a v-bounded trajectory γn matching (pn,tn) that realises C[tn,+). We define the trajectory γ as follows:

γ(t)={γ0(t)if t(,t1],γi(t)if t[ti,ti+1] and γn(t)if t[tn,+).

We note that the intervals [ti1,ti] and [ti,ti+1] both contain ti. However, this does not pose a problem for the definition of γ, because both γi1 and γi visit (pi,ti), which means γi1(ti)=γi(ti)=pi. It only requires a simple application of the triangle inequality (of Euclidean distance) to show that γ is a v-bounded trajectory. For i=1,,n, we have γ(ti)=pi, thus γ matches S. The only thing left to prove is that γ realises C. Let A=𝗏𝗂𝗌𝗂𝗍𝗌(R,t) be an arbitrary conjunct of C. Depending on t, there are three cases to be considered. First, if t(,t1], then A is a conjunct of C(,t1]. This implies γ0 realises A, so γ(t)=γ0(t)R, which means γ realises A. The other two cases, when t is contained in [ti,ti+1] or in [tn,+), are similar. Because γ realises all the conjuncts of C, it also realises C itself.

Lemma 18.

If C=𝗏𝗂𝗌𝗂𝗍𝗌(R1,t1)𝗏𝗂𝗌𝗂𝗍𝗌(Rk,tk) is a conjunction of atomic (,𝖬𝗈𝗆𝖾𝗇𝗍)-events, S is a trajectory sample and v a speed bound, then (S,v)C if and only if there exist k spatial locations p1,,pk such that

  1. (1)

    for i=1,,k we have piRi, and

  2. (2)

    the sample containing the points (p1,t1),,(pk,tk), as well as the ones in S, is v-consistent.

We remark that we consider condition (2) from the lemma to be false when some space-time point among (p1,t1),(pk,tk) shares its temporal component with a point in S, while their spatial component differs.

Proof.

The “only if”-direction is obvious. We prove the “if”-direction. Assume there are locations p1,,pk satisfying (1) and (2). Now take the trajectory γ to be the linear interpolation trajectory of the sample containing the points (p1,t1),,(pk,tk), as well as the ones in S. It is clear that γ matches S and assumption (2) implies it is a v-bounded trajectory. Finally, assumption (1) implies that γ realises C.

Definition 19.

We say two (,𝒯)-events A and B are equivalent if for every sample S and speed bound v, we have (S,v)A if and only if (S,v)B.

The following proposition follows directly from the fact that, for p2, we have pR(φ) if and only if pR(¬φ).

Proposition 20.

A (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-event of the form ¬𝗏𝗂𝗌𝗂𝗍𝗌(R(φ),t) is equivalent to the event 𝗏𝗂𝗌𝗂𝗍𝗌(R(¬φ),t).

Given an arbitrary (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-event, we can convert it into negation normal form and use Proposition 20 to remove all negations. The result of this process is a positive (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-event that is equivalent to the original. Because this process can be performed in linear time (with respect to the length of the event), we can assume that any given (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-event is positive, without loss of generality.

Theorem 21.

In terms of data complexity, the (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem is decidable in linear time.

Proof.

We describe an algorithm to decide the realisability problem of a (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖬𝗈𝗆𝖾𝗇𝗍)-event e for input sample S=(p1,t1),,(pn,tn), where pi=(xi,yi), and speed bound v. Noting the remark made above, we assume that e is positive. The first step is to convert e into its disjunctive normal form e¯. Since we are dealing with data complexity, the possibly increased size of e¯, compared to e, has no impact on the running time of our method. In fact, because the number of disjuncts of e¯ is constant, it is sufficient to show that the realisability problem for a single disjunct can be decided in linear time.

Consider a disjunct C of e¯. Because e¯ is positive and in DNF, the event C must be a conjunction of atomic events. This means we can apply Lemma 17, and we can determine whether (S,v)C by testing whether each of the following conditions are met:

  1. (1)

    ((p1,t1),v)C(,t1],

  2. (2)

    for i=1,,n1, we have ((pi,ti),(pi+1,ti+1),v)C[ti,ti+1], and

  3. (3)

    ((pn,tn),v)C[tn,+).

We focus our attention to condition (2). For every value of i, we want to decide whether C[ti,ti+1] is realisable for sample (pi,ti),(pi+1,ti+1) and speed bound v. Let us write the conjunction C[ti,ti+1] as 𝗏𝗂𝗌𝗂𝗍𝗌(R(φ1),t1)𝗏𝗂𝗌𝗂𝗍𝗌(R(φk),tk), with t1tk. We remark that this implies tit1tkti+1. From Lemma 18, we know that ((pi,ti),(pi+1,ti+1),v)C[ti,ti+1] if and only if there exist locations q1,,qk2, with qi=(xi,yi), such that

  1. (a)

    for j=1,,k we have qjR(φj), and

  2. (b)

    the sample (pi,ti),(q1,t1),,(qk,tk),(pi+1,ti+1) is v-consistent.

In other words, we have ((pi,ti),(pi+1,ti+1),v)C[ti,ti+1] if and only if the formula ψ=x1y1xkyk(ψaψb) is true, where ψa is φ1(x1,y1)φk(xk,yk), expressing condition (a), and to express condition (b), we take ψb to be the conjunction of k+1 distance inequalities.

Because ψ is a first-order logic sentence over the ordered field of real numbers, its truth can be determined by a decision procedure for the theory of real closed fields (first described by Tarski [19], we refer to Basu et al. [2] for a modern exposition). We note that the size of ψ is independent of n, and thus has constant size in the data complexity setting (k is bounded by the length of C). The time needed to determine the truth of ψ is thus also constant. To test for condition (2), we have to perform the above steps for n1 values of i. Conditions (1) and (3) can both be tested in constant time, in a manner similar to the above, the only difference is that the constructed sentence requires one less inequality for expressing v-consistency. We have thus shown that the realisability problem for a single disjunct of e¯ can be answered in O(n) time. This concludes the proof.

Our next result concerns the data complexity of the (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-realisability problem for positive events.

Lemma 22.

If e is a positive (,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-event containing k distinct atoms A1,,Ak, where Ai=𝗏𝗂𝗌𝗂𝗍𝗌(Ri,Ti), then (S,v)e if and only if there exist k space-time points (p1,t1),,(pk,tk) such that

  1. (1)

    the sample containing the points (p1,t1),,(pk,tk), as well as the ones in S, is v-consistent, and

  2. (2)

    the Boolean expression obtained by replacing Ai in e by true if (pi,ti)Ri×Ti, and by false otherwise, evaluates to true.

Proof.

We first prove the “if”-direction. Assume that there exist (p1,t1),,(pk,tk) satisfying (1) and (2). Let S be the sample containing the points (p1,t1),,(pk,tk), as well as the ones in S. Assumption (1) says S is v-consistent, which means 𝖫𝖨𝖳(S) is v-bounded. Because S is an extension of S, and γ matches S, it must also match S. The trajectory 𝖫𝖨𝖳(S) realises all the atoms Ai for which (pi,ti)Ri×Ti, and potentially others. Because of assumption (2) and the fact that e is positive, 𝖫𝖨𝖳(S) realises e, and thus (S,v)e.

For the “only if”-direction, we assume that (S,v)e. That means that there exists a v-bounded γ realising e and matching S. We choose (p1,t1),,(pk,tk) as follows. For every atom Ai which γ realises, we know that γ visits some point in Ri×Ti, and take (pi,ti) to be such point. For an atom Ai not realised by γ, we take (quite arbitrarily) (pi,ti) to be the first anchor point of S. Then, (pi,ti)Ri×Ti, since otherwise γ would realise Ai. All points of (p1,t1),,(pk,tk) and S are visited by γ, so the sample containing all those points must be v-consistent, satisfying condition (1). Because γ realises Ai if and only if (pi,ti)Ri×Ti and γ realises e, condition (2) is also satisfied.

The following proposition follows directly from the definition of v-consistency and the fact that the distance function d obeys the triangle inequality.

Proposition 23.

If M is an (unordered) finite set of space-time points, then the sample containing all points in M is v-consistent if and only if for every pair of points (p1,t1) and (p2,t2) from M, we have d(p1,p2)v|t1t2|.

It will be of interest later that the condition d(p1,p2)v|t1t2| from the above proposition is equivalent to d(p1,p2)2v2(t1t2)2, which, if p1=(x1,y1) and p2=(x2,y2), is a polynomial inequality with variables x1,y1,t1,x2,y2,,t2,v.

Theorem 24.

In terms of data complexity, the (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-realisability problem for positive events can be decided in polynomial time.

Proof.

We describe an algorithm to decide the realisability of a positive (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-event e for input sample S, of length n, and speed bound v. Lemma 22 gives a condition equivalent to (S,v)e. We can express this condition using a first-order logic sentence over the ordered field of real numbers ψ=x1y1t1xkyktk(ψ1ψ2), where ψ1 expresses part (1) of Lemma 22, and ψ2 expresses part (2). To express part (1), stating that the sample containing (x1,y1,t1),,(xk,yk,tk) as well as the points from S is v-consistent, we can take ψ1 to be a conjunction of (n+k)2 distance inequalities, as per Proposition 23. To express the second part, we construct ψ2 by taking e and replacing every atom 𝗏𝗂𝗌𝗂𝗍𝗌(R(φi),[ti,ti+]) by the formula φi(xi,yi)titititi+.

We have now constructed a formula ψ which is true if and only if (S,v)e. Thus, if there is a method to determine the truth ψ, we can decide the realisability problem for positive (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-events. Because ψ is an existantial sentence, we can apply known decision procedures for the existential theory of the reals. Of course, the time complexity required to decide the realisability of positive (𝖲𝖾𝗆𝗂𝖠𝗅𝗀,𝖨𝗇𝗍𝖾𝗋𝗏𝖺𝗅)-events in the described manner, depends on the time complexity of the existential theory of the reals. In [2] (see theorem 13.14), an upper bound of sm+1dO(m) is given, where s is the number of polynomials occuring in the formula, m the number of variables, and d the maximum degree of the polynomials. Because we are considering data complexity, the number of polynomials in the φi’s, as well as their maximum degree, is constant, and so is k. This implies that ψ contains O(n2) polynomials of constant degree, and 3k variables. Thus, by Theorem 13.14 from [2], the truth of ψ can be decided in O((n2)3k+1)=O(n6k+2) time, which is polynomial in n. It is also clear that ψ can be constructed in polynomial time.

4.3 A class of events for which the realisability problem has polynomial time combined complexity

Until now, we have only seen hardness results in the query (and thus, combined) complexity setting. In this section, we give an example of a class of events for which the realisability problem has polynomial-time combined complexity.

An (,𝒯)-literal is either an atomic (,𝒯)-event, or the negation of an atomic (,𝒯)-event. Conjunctions of (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-literals provide a class of events for which the realisability problem has an efficient solution in terms of combined complexity. Before giving our result, we first prove a lemma.

Lemma 25.

If S1 and S2 are trajectory samples, then there exists a v-bounded trajectory matching S1 and not visiting any point in S2 if and only if

  1. (1)

    S1 does not contain a point in S2,

  2. (2)

    S1 is v-consistent, and

  3. (3)

    if 𝖫𝖨𝖳(S1) visits a point from S2, on the line segment between points (pi,ti) and (pi+1,ti+1) from S1, then d(pi,pi+1)<v(ti+1ti).

Proof.

It is obvious that the conditions (1), (2) and (3) are necessary for the existence of a v-bounded trajectory matching S1 and not visiting points in S2. To show that they are also sufficient, we assume that all three conditions are met. Consider the trajectory 𝖫𝖨𝖳(S1). By condition (2), it is v-bounded. In case 𝖫𝖨𝖳(S1) visits one or more points from S2, we show that we can extend (by adding points) S1 to a sample S, such that 𝖫𝖨𝖳(S) is v-bounded, matches S1 and does not visit points from S2. Because S is obtained by adding points to S1, it is obvious that 𝖫𝖨𝖳(S) matches S1. Let us say that 𝖫𝖨𝖳(S1) visits a point (q,t) from S2 on the line segment between (pi,ti) and (pi+1,ti+1). From (1) we know that ti<t<ti+1, and from (3) we have d(pi,pi+1)<v(ti+1ti). Together, these observations imply there must be a small disk around q, such that for every p in this disk, the extension of S1 with the point (p,t) remains v-consistent. Informally, this means we can deviate 𝖫𝖨𝖳(S1) slightly between ti and ti+1, while the trajectory remains v-bounded. Now, for every point (q,t) of S2 with ti<t<ti+1, there is at most one choice for p inside the disk that makes the linear interpolation trajectory of the extended sample visit (q,t). Because there are infinitely many points in the disks, we can choose p such that, between ti and ti+1, none of the points from S2 are visited. We can repeat this process for each line segment of 𝖫𝖨𝖳(S1) that visits a point from S2. The linear interpolation trajectory of the resulting sample S then matches S1, is v-bounded and does not visit points from S2, as desired.

Theorem 26.

In terms of combined complexity, the (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem for a conjunction of k literals and a trajectory sample of length n is decidable in O(n+klogk) time.

Proof.

We assume that we are given a conjunction C of k (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-literals, a sample S of length n and a speed bound v as input. Let P1,,Pm be the positive atoms occuring in C and let N1,,N be the atoms occuring negated in C. First we compute the sample S1=(p1,t1),(pn+m,tn+m), containing both the points in S and the ones occuring in the atoms P1,,Pm. Computing this sample requires ordering the points occuring in P1,,Pm by time, and merging these with those from S (which are already ordered by time). This can be done in O(n+klogk) time. We also compute a sample S2, containing the points from N1,,N, in O(klogk) time. Now we have (S,v)C if and only if there exists a v-bounded trajectory matching S1 and not visiting any point in S2. This can be determined by testing for the conditions of Lemma 25 in O(|S1|+|S2|) time, where |S1|+|S2|=n+m+=n+k. Thus, the total time required by the described procedure is O(n+klogk).

Corollary 27.

In terms of combined complexity, the (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-realisability problem for an event in disjunctive normal form, of length k, and a trajectory sample of length n is decidable in O(kn+klogk) time.

Proof.

We assume that we are given a (𝖯𝗈𝗂𝗇𝗍,𝖬𝗈𝗆𝖾𝗇𝗍)-event e in disjunctive normal form, of length k, a sample S of length n and a speed bound v as input. Then, e is of the form C1C, where every disjunct Ci is a conjunction of literals. Let us say that the event Ci contains ki literals. Because e is realisable if and only if one of the Ci’s is realisable, we can use Theorem 26 to determine the realisability of e, by determining it for each of the disjuncts. This takes i=1O(n+kilogki) time, which is O(kn+klogk).

5 Conclusion

We have proposed a family of query languages for trajectory sample databases. Query expressions in these languages contain events, which are used to describe constraints on trajectories. When performing query evaluation, an essential problem required to be solved is that of the realisability of an event. We studied the complexity of this realisability problem in terms of data, query and combined complexity. These results are summarised in Table 1. These complexity results are given in a computational model wherein (arithmetic and comparison) operations on rational numbers take unit time. It is not clear whether our results remain true in a model in which we measure the cost of these operations in terms of the length of the bit-representation of rational numbers. Also, we did not give much attention to the evaluation of query expressions outside of the realisability problem. We assumed that query expressions are evaluated in some standard way, but more efficient strategies might exist.

References

  • [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  • [2] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in real algebraic geometry. Algorithms and Computation in Mathematics 10. Springer, Berlin, 2003.
  • [3] Jacek Bochnak, Michel Coste, and Marie-Françoise Roy. Real Algebraic Geometry, volume 36 of Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag, 1998.
  • [4] Vania Bogorny, Bart Kuijpers, and Luis Otávio Alvares. ST-DMQL: A semantic trajectory data mining query language. Int. J. Geogr. Inf. Sci., 23(10):1245–1276, 2009. URL: http://www.informaworld.com/smpp/content%7Edb=all%7Econtent=a915093101%7Efrm=abslink.
  • [5] L. Burns. Transportation, Temporal, and Spatial Components of Accessibility. Lexington Books, Lexington, MA, 1979.
  • [6] Max J. Egenhofer. Approximation of geospatial lifelines. In Elisa Bertino and Leila De Floriani, editors, SpadaGIS, Workshop on Spatial Data and Geographic Information Systems. University of Genova, 2003.
  • [7] Fosca Giannotti and Dino Pedreschi, editors. Mobility, Data Mining and Privacy - Geographic Knowledge Discovery. Springer, 2008. doi:10.1007/978-3-540-75177-9.
  • [8] R. Güting and M. Schneider. Moving Object Databases. Morgan Kaufmann, 2005.
  • [9] T. Hägerstrand. What about people in regional science? Papers of the Regional Science Association, 24:7–21, 1970.
  • [10] Kathleen Hornsby and Max J. Egenhofer. Modeling moving objects over multiple granularities. Ann. Math. Artif. Intell., 36(1-2):177–194, 2002. doi:10.1023/A:1015812206586.
  • [11] Tomasz Imieliński and Witold Lipski. Incomplete information in relational databases. J. ACM, 31(4):761–791, September 1984. doi:10.1145/1634.1886.
  • [12] Donald Janelle and Michael Goodchild. Diurnal patterns of social group distributions in a canadian city. Economic Geography, 59(4):403–425, 1983. URL: http://www.jstor.org/stable/144166.
  • [13] B. Lenntorp. Paths in Space-Time Environments: A Time-Geographic Study of the Movement Possibilities of Individuals. Number 44 in Series B. Lund Studies in Geography, 1976.
  • [14] H.J. Miller. Modeling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information Systems, 5:287–301, 1991. doi:10.1080/02693799108927856.
  • [15] H.J. Miller. A measurement theory for time geography. Geographical Analysis, 2005. doi:10.1111/j.1538-4632.2005.00575.x.
  • [16] Christos H. Papadimitriou. The Euclidean travelling salesman problem is NP-complete. Theoretical Computer Science, 4(3):237–244, 1977. doi:10.1016/0304-3975(77)90012-3.
  • [17] Christine Parent, Stefano Spaccapietra, Chiara Renso, Gennady L. Andrienko, Natalia V. Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas-Divanis, José Antônio Fernandes de Macêdo, Nikos Pelekis, Yannis Theodoridis, and Zhixian Yan. Semantic trajectories modeling and analysis. ACM Comput. Surv., 45(4):42:1–42:32, 2013. doi:10.1145/2501654.2501656.
  • [18] Dieter Pfoser and Christian S. Jensen. Capturing the uncertainty of moving-object representations. In Ralf Hartmut Güting, Dimitris Papadias, and Frederick H. Lochovsky, editors, Advances in Spatial Databases, 6th International Symposium, SSD’99, Proceedings, volume 1651 of Lecture Notes in Computer Science, pages 111–132. Springer, 1999. doi:10.1007/3-540-48482-5_9.
  • [19] Alfred Tarski and J. C. C. McKinsey. A Decision Method for Elementary Algebra and Geometry. University of California Press, 1951.
  • [20] G. Trajcevski, O. Wolfson, K. Hinrichs, and S. Chamberlain. Managing uncertainty in moving objects databases. ACM Trans. Database Syst., 29(3):463–507, 2004. doi:10.1145/1016028.1016030.
  • [21] Zhixian Yan, Dipanjan Chakraborty, Christine Parent, Stefano Spaccapietra, and Karl Aberer. Semantic trajectories: Mobility data computation and annotation. ACM Trans. Intell. Syst. Technol., 4(3):49:1–49:38, 2013. doi:10.1145/2483669.2483682.