Abstract 1 Introduction 2 Time-Aware Joins and Time Series Operators by Example 3 The Temporal Vadalog Architecture References Appendix A DatalogMTL

The Temporal Vadalog System

Luigi Bellomarini ORCID Bank of Italy, Rome, Italy Livia Blasi ORCID TU Wien, Vienna, Austria
Bank of Italy, Rome, Italy
Markus Nissl ORCID TU Wien, Vienna, Austria Emanuel Sallinger ORCID TU Wien, Vienna, Austria
University of Oxford, Oxford, UK
Abstract

The recent resurgence of the Datalog language in the Knowledge Representation and Reasoning community has paved the way for a very promising proposal for temporal extension. DatalogMTL (Datalog with Metric Temporal Operators) is a language that offers a good trade-off between computational complexity and expressive power. However, existing implementations are still preliminary or prototypical. In this extended abstract, we give a brief overview of Temporal Vadalog, a system supporting reasoning over DatalogMTL programs built upon an engineered architecture and adopted in production scenarios in the financial setting.

Keywords and phrases:
temporal reasoning, Datalog, DatalogMTL
Category:
Short Paper
Copyright and License:
[Uncaptioned image] © Luigi Bellomarini, Livia Blasi, Markus Nissl, and Emanuel Sallinger; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Automated reasoning
; Information systems Database management system engines
Related Version:
This is an extended abstract of a paper previously published on Theory and Practice of Logic Programming.
Funding:
This work has been supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT2201, 10.47379/VRG18013, 10.47379/NXT22018]; and by the Austrian Science Fund (FWF) 10.55776/COE12.
Editors:
Thierry Vidal and Przemysław Andrzej Wałęga

1 Introduction

The adoption of Datalog-based systems for Knowledge Representation and Reasoning (KRR) and their growing application in production settings such as the financial space [1] are going hand in hand with a wave of research into extensions to Datalog, known as the Datalog± family [10]. The aim is to strike a good balance between computational complexity and high expressivity, that is, to incorporate the features needed for real-world applications. In this context, a key requirement in KRR is native support for time through the reasoning process.

The recent introduction of the extension of Datalog with metric temporal operators, named DatalogMTL [8], along with the good computational characteristics of its fragments [16, 17, 18], brought the potential of temporal reasoning to real-world applications, from transport and robotics, from healthcare to finance. However, the integration of such capabilities into a production-ready system requires functional and architectural characteristics that ensure rigorous and efficient implementation of the language in computation and memory footprints. To our knowledge, all currently existing DatalogMTL implementations remain at a prototypical stage or are designed primarily for exploratory research [7, 19].

Contribution.

Motivated by the need to adopt temporal reasoning in high-stake financial settings for the Central Bank of Italy, in this extended abstract [4], we introduce the Temporal Vadalog System, the temporal extension to the Vadalog System, a state-of-the-art Datalog-based reasoner [2].

Overview.

Due to space constraints, the remainder of the paper provides an example-based glimpse into the functional and non-functional approaches to implementing DatalogMTL in Vadalog, specifically in Section 2, focussing on joins, and in Section 3, which describes the architecture. Some preliminaries about DatalogMTL can be found in Appendix A. For a complete overview, the reader is referred to the full paper [4].

2 Time-Aware Joins and Time Series Operators by Example

To briefly illustrate the capabilities of the system, we proceed by example with a realistic – albeit simplified – case from the financial domain.

Example 1.

A Financial Intelligence Unit (FIU) is an agency responsible for collecting information about suspicious financial activity to support investigations into money laundering or terrorism financing. They aim to identify companies with abnormal spending behavior and investigate the corporate networks they belong to. We describe the scenario by a database 𝒟 of temporally annotated facts and the following set Π of DatalogMTL rules.

[0,3]𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦(C),¬[0,7]𝑚𝑎𝑟𝑘𝑒𝑑𝐴𝑠𝑆𝑎𝑓𝑒(C) 𝑑𝑖𝑟𝑒𝑐𝑡𝑆𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠(C) (1)
𝑑𝑖𝑟𝑒𝑐𝑡𝑆𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠(C) 𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠(C) (2)
𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠(C),𝑙𝑖𝑛𝑘𝑒𝑑(C,O) 𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠(O) (3)

We assume that the reader is familiar with the logic-based formulation of Datalog syntax, with the body of a rule (left-hand side) being the implication premise, a logical conjunction of predicate over terms (i.e., constants or variables), while the head (right-hand side) is the implication conclusion. Ignoring the temporal angle, Rule 1 describes the conditions (company C involved in suspiciousActivity while not being markedAsSafe) under which a company C is marked as directSuspicious, while Rules 2-3 recursively mark every directSuspicious company C as suspicious (Rule 2) and then proceed to mark every other company o that is linked to suspicious company C as suspicious as well (Rule 3). Coming back to Rule 1, we see how the temporal aspect is essential: a company that is markedAsSafe in a distant past is not thereby exempt from having its current suspiciousActivity scrutinized. Metric temporal operators are helpful here: assuming day granularity, when prefixed with [0,3], the atom suspiciousActivity only holds if the atom itself has continuously held in the interval [t0,t3] if evaluated at t – the suspiciousActivity continuously held for the previous 3 days – while with [0,7], markedAsSafe only holds if the clearance occurred sometime between [t0,t7]. Let us consider database 𝒟:

𝒟=
{𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦(A)@[𝙼𝚊𝚢𝟶𝟷,𝙼𝚊𝚢𝟶𝟺],𝑚𝑎𝑟𝑘𝑒𝑑𝐴𝑠𝑆𝑎𝑓𝑒(A)@[𝙰𝚙𝚛𝟹𝟶,𝙰𝚙𝚛𝟹𝟶],
𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦(B)@[𝙼𝚊𝚢𝟶𝟸,𝙼𝚊𝚢𝟶𝟻],𝑚𝑎𝑟𝑘𝑒𝑑𝐴𝑠𝑆𝑎𝑓𝑒(B)@[𝙰𝚙𝚛𝟸𝟷,𝙰𝚙𝚛𝟸𝟷]}

By applying Rule 1, we see that only company B would be marked as directSuspicious as it was not markedAsSafe in the 7 days prior to the suspiciousActivity. However, to compute the conjunction between these two temporal atoms – [0,3]𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦 and [0,7]𝑚𝑎𝑟𝑘𝑒𝑑𝐴𝑠𝑆𝑎𝑓𝑒 – one would need a time-aware join, able to join facts and their temporal intervals at the same time, while also handling stratified negation.

Figure 1: Illustration of an example of a temporal join algorithm execution; atoms with predicates A0 and A1 are joined on a PlantNameTerm.
Program: EMA
Dataset Vadalog InfluxDB
NQ1 0.0953 13.989
NQ10 4.8127 34.57
NQ50 74.1943 591.554
NQ100 306.6180 820.992
Figure 2: Experiment results, expressed in seconds, for the Exponential Moving Average in Vadalog and in the TSDB InfluxDB in the full paper [4]

Temporal Join Algorithm

Algorithm 1 Temporal Join between two predicates.

Input: predicates A0 and A1 to be joined, with A0 not negated

In Temporal Vadalog, the join algorithm is a temporal interval extension of the slot machine join of the Vadalog system [2], which is based on the index nested loop join [11]. In particular, during the first full scan of the inner relation, an in-memory index is constructed, unlike in the usual algorithms, where a precomputed index is typically present in materialized form.

The algorithm, fully reported in Algorithm 1, intuitively works as follows. Assume we have two predicates to be joined A0 and A1; first, we use the A1 index to retrieve the next already-scanned fact that matches the join term(s) from A0 (Line 8). If the index does not contain such a fact, we pursue the full scan until either a matching fact is found or all facts have been examined (Lines 10-14). If no further fact is found (Lines 15-22), we continue the scan with the next A0 (if it exists and if A1 is not negated). In case a fact is found, independently of whether it is from the index or the full scan, we produce the valid interval of the joined fact using the join logic (Line 23): difference for a negated literal, intersection for a positive interval, or a blend of interval operations and set operations for temporal operators like 𝒮 (since) and 𝒰 (Until). In the end, we check whether the resulting interval is valid, and if not, if A1 is negated, we continue the scan with the next A0 (if it exists) (Lines 25-29), otherwise we proceed with the loop; if the interval is non-empty and Ak is not negated, we return true as we have found a valid joined fact (Lines 31-32); otherwise, we continue retrieving the next “negated” fact. A visual representation of the execution of a temporal join is shown in Figure 1.

Time Series Operators

In Example 1, we assumed that the suspiciousActivity facts were provided by 𝒟. Now we want to consider the predicate as a result of a different reasoning task.

Example 2.

Understanding whether a spending activity is suspicious implies detecting anomalous patterns, a task extremely relevant for financial authorities. A form of behavioural analysis is adopted here: if the number of flagged transactions is greater than the company average for a given period, then the spending activity is suspicious.

𝑓𝑙𝑎𝑔𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠(C,𝐴𝑀𝑇),𝑠𝑚𝑎(C,𝐴𝑉𝐺),𝐴𝑀𝑇>𝐴𝑉𝐺 𝑠𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦(C) (4)

The average value is given by the sma predicate (simple moving average), which encapsulates a time series operator that performs a moving average calculation to smooth the signal and filter out noise and transient variations. While time series analysis typically adopts ad-hoc software libraries, Temporal Vadalog intrinsically offers such statistics:

[0,n)𝑡𝑖𝑚𝑒𝑆𝑒𝑟𝑖𝑒𝑠(X,𝑉𝑎𝑙𝑢𝑒) 𝑒𝑥𝑡𝑒𝑛𝑑𝑒𝑑(X,𝑉𝑎𝑙𝑢𝑒) (5)
𝑡𝑖𝑚𝑒𝑆𝑒𝑟𝑖𝑒𝑠(X,𝑉𝑎𝑙𝑢𝑒),𝑒𝑥𝑡𝑒𝑛𝑑𝑒𝑑(X,Roll) 𝑟𝑜𝑙𝑙𝑖𝑛𝑔(X,𝑅𝑜𝑙𝑙) (6)
𝑟𝑜𝑙𝑙𝑖𝑛𝑔(X,Roll),𝐴𝑣𝑔=avg(Roll) 𝑠𝑚𝑎(X,𝐴𝑣𝑔) (7)

We use to extend the validity of the window over n days (Rule 5), and we join it with the original time series to pin it to the correct starting date (Rule 6). As in SMA all data points have equal weight, Rule 7 computes the mathematical average over every window. Performance has been evaluated against a time series database (TSDB) in the full paper [4], in this case with the exponential moving average (EMA), on the NASDAQ Composite Index time series [14]. Results are shown in Figure 2.

3 The Temporal Vadalog Architecture

The temporal join is only one component of the system, and several others are required in order to support query answering over a set of rules like that of Examples 1-2, among which the transformations from the temporal operators and termination of recursive rules. Looking at the larger picture, the Temporal Vadalog architecture extends the volcano iterator model [13] of the Vadalog system [6] with time-awareness. A DatalogMTL program Π is transformed into an execution pipeline that reads data from sources, applies the transformations (both algebraic and time-related ones) and returns the intended output as a result. The process consists of two stages: in the first, the pipeline is built through a sequence of compilers and optimizers that gradually transforms the set of rules into a reasoning query plan. Taking inspiration from the pipe and filters architecture [9], each required transformation is represented by a filter, while dependencies between rules are represented by pipes. The second stage is at runtime, where a pull-based approach is used. Starting from the output filter, next() calls propagate through filter chains to source filters. Each filter applies the requested transformation based on the rule it represents. As long as data are available in the filter cascade, next() succeeds.

A number of time-relevant operations are tackled along the way: (a) the transformation of time intervals through the application of temporal operators; (b) the implementation of merging operations for intervals through various strategies to ensure correctness and efficiency [3]; (c) the temporal joins in the presence of stratified negation; (d) detection of repeating temporal patterns through the so-called termination strategies, i.e. techniques to guaranteee termination; (e) temporal aggregations [5]; and (f) the possibility to switch between temporal and to non-temporal reasoning, to activate non-temporal features, essential in some reasoning settings, such as existential quantification.

Summary.

In this work, we showed the Temporal Vadalog system by first describing temporal joins and operators applied to an example, and concluded by briefly discussing the “big picture” of its architecture. The interested reader is referred to the full paper.

References

  • [1] Teodoro Baldazzi, Luigi Bellomarini, and Emanuel Sallinger. Reasoning over financial scenarios with the Vadalog system. In EDBT, pages 782–791. OpenProceedings.org, 2023. doi:10.48786/edbt.2023.66.
  • [2] Luigi Bellomarini, Davide Benedetto, Georg Gottlob, and Emanuel Sallinger. Vadalog: A modern architecture for automated reasoning with large knowledge graphs. IS, 105:101528, 2022. doi:10.1016/j.is.2020.101528.
  • [3] Luigi Bellomarini, Livia Blasi, Markus Nissl, and Emanuel Sallinger. The Temporal Vadalog system. In RuleML+RR, volume 13752, pages 130–145. Springer, 2022. doi:10.1007/978-3-031-21541-4_9.
  • [4] Luigi Bellomarini, Livia Blasi, Markus Nissl, and Emanuel Sallinger. The Temporal Vadalog system: Temporal Datalog-based reasoning. Theory and Practice of Logic Programming, pages 1–29, 2025. doi:10.1017/S1471068425000018.
  • [5] Luigi Bellomarini, Markus Nissl, and Emanuel Sallinger. Monotonic Aggregation for Temporal Datalog. In Proceedings of the 15th International Rule Challenge, volume 2956, 2021. URL: https://ceur-ws.org/Vol-2956/paper30.pdf.
  • [6] Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. The Vadalog System: Datalog-based reasoning for knowledge graphs. PVLDB, 11(9):975–987, 2018. doi:10.14778/3213880.3213888.
  • [7] Sebastian Brandt, Elem Güzel Kalayci, Roman Kontchakov, Vladislav Ryzhikov, Guohui Xiao, and Michael Zakharyaschev. Ontology-based data access with a Horn fragment of metric temporal logic. In AAAI, pages 1070–76. AAAI Press, 2017. doi:10.1609/aaai.v31i1.10696.
  • [8] Sebastian Brandt, Elem Güzel Kalayci, Vladislav Ryzhikov, Guohui Xiao, and Michael Zakharyaschev. Querying log data with metric temporal logic. J. Artif. Intell. Res., 62:829–877, 2018. doi:10.1613/jair.1.11229.
  • [9] Frank Buschmann, Kevlin Henney, and Douglas C. Schmidt. Pattern-oriented software architecture, 4th Edition. Wiley, 2007. URL: https://www.worldcat.org/oclc/314792015.
  • [10] Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-based framework for tractable query answering over ontologies. J. Web Semant., 14:57–83, 2012. doi:10.1016/j.websem.2012.03.001.
  • [11] Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database systems - the complete book (2. ed.). Pearson Education, 2009.
  • [12] Allen Van Gelder, Kenneth A. Ross, and John S. Schlipf. The well-founded semantics for general logic programs. J. ACM, 38(3):620–650, 1991. doi:10.1145/116825.116838.
  • [13] Goetz Graefe and William J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209–218, 1993. doi:10.1109/ICDE.1993.344061.
  • [14] NASDAQ OMX Group. NASDAQ composite index. http://tinyurl.com/frednq, 2023. FRED, Federal Reserve Bank of St. Louis, Accessed: 2023-03-22.
  • [15] David J. Tena Cucala, Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, and Egor V. Kostylev. Stratified negation in Datalog with metric temporal operators. In AAAI, pages 6488–6495, 2021. doi:10.1609/aaai.v35i7.16804.
  • [16] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. DatalogMTL: Computational complexity and expressive power. In IJCAI, pages 1886–1892, 2019. doi:10.24963/ijcai.2019/261.
  • [17] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. DatalogMTL over the integer timeline. In KR, pages 768–77, 2020. doi:10.24963/kr.2020/79.
  • [18] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. Tractable fragments of Datalog with metric temporal operators. In IJCAI, pages 1919–1925, 2020. doi:10.24963/ijcai.2020/266.
  • [19] Dingmin Wang, Pan Hu, Przemyslaw Andrzej Walega, and Bernardo Cuenca Grau. MeTeoR: Practical reasoning in Datalog with metric temporal operators. In AAAI, pages 5906–5913, 2022. doi:10.1609/aaai.v36i5.20535.

The appendix includes excerpts from the full version of the paper [4] to cover the more technical parts of this extended abstract. In particular, Appendix A provides the background of DatalogMTL.

Appendix A DatalogMTL

DatalogMTL is Datalog extended with operators from the metric temporal logic. We provide a summary of DatalogMTL with stratified negation under continuous semantics. DatalogMTL is defined over the rational timeline, i.e., an ordered set of rational numbers . An interval ϱ=ϱ,ϱ+ is a non-empty subset of such that for each t where ϱ<t<ϱ+, tϱ, and the endpoints ϱ,ϱ+{,}. The brackets denote whether the interval is closed (“[]”), half-open (“[)”,“(]”) or open (“()”), whereas angle brackets (“”) are used when unspecified. An interval is punctual if it is of the form [t,t], positive if ϱ0, and bounded if ϱ,ϱ+.

DatalogMTL extends the syntax of Datalog with negation with temporal operators [15]. For the following definitions, we consider a function-free first-order signature. An atom is of the form P(𝝉), where P is a n-ary predicate and 𝝉 is a n-ary tuple of terms, where a term is either a constant or a variable. An atom is ground if it contains no variables. A fact is an expression P(𝝉)@ϱ, where ϱ is an interval and P(𝝉) a ground atom and a database is a set of facts. A literal is an expression given by the following grammar, where ϱ is a positive interval: A::=P(𝝉)ϱAϱAϱAϱAA𝒮ϱAA𝒰ϱA. A rule is an expression given by the following grammar, where i,j0, each Ak (k0) is a literal and B is an atom: A1AinotAi+1notAi+jB. The conjunction of literals Ak is the rule body, where A1Ai denote positive literals and Ai+1Ai+j denote negated (i.e., prefixed with not) literals. The atom B is the rule head. A rule is safe if each variable occurs in at least one positive body literal, positive if it has no negated body literals (i.e., j=0), and ground if it contains no variables. A program Π is a set of safe rules and is stratifiable if there exists a stratification of a program Π. A stratification of Π is defined as a function σ that maps each predicate P in Π to a positive integer (stratum) s.t. for each rule, where Ph denotes a predicate of the head, and P+ (resp. P) a positive (negative) body predicate, σ(Ph)σ(P+) and σ(Ph)>σ(P). The semantics of DatalogMTL is given by an interpretation 𝔐 that specifies for each time point t and each ground atom P(𝝉), whether P(𝝉) is satisfied at t, in which case we write 𝔐,tP(𝝉). This satisfiability notion extends to ground literals as follows:

𝔐,t for each t𝔐,t for no t𝔐,tϱAiff 𝔐,sA for all s with tsϱ𝔐,tϱAiff 𝔐,sA for all s with stϱ𝔐,tA𝒮ϱAiff 𝔐,sA for some s with tsϱ𝔐,rA for all r(s,t)𝔐,tA𝒰ϱAiff 𝔐,sA for some s with stϱ𝔐,rA for all r(t,s)𝔐,tϱAiff 𝔐,sA for some s with tsϱ𝔐,tϱAiff 𝔐,sA for some s with stϱ

An interpretation 𝔐 satisfies notA (𝔐,tnotA), if 𝔐,t⊧̸A, a fact P(𝝉)@ϱ, if 𝔐,tP(𝝉) for all tϱ, and a set of facts 𝒟 if it is a model of each fact in 𝒟. Furthermore, 𝔐 satisfies a ground rule r if 𝔐,tAk for 0ki and 𝔐,tnotAk for i+1ki+j for every t; for every t, if the literals in the body are satisfied, so is the head 𝔐,tB; 𝔐 satisfies a rule when it satisfies every possible grounding of the rule. Moreover, 𝔐 is a model of a program if it satisfies every rule in the program and the program has a stratification, i.e., it is stratifiable. Given a stratifiable program Π and a set of facts 𝒟, we call Π,𝒟 the canonical model of Π and 𝒟 [8], and define it as the minimum model of Π and 𝒟, i.e., Π,𝒟 is the minimum model for all the facts of 𝒟 and the rules of Π. In this context, “minimum” means that the set of positive literals in 𝔐 is minimized or, equivalently, that the positive literals of this model are contained in every other model. Since Π is stratifiable, this minimum model exists and is unique [12]. According to Tena Cucala’s notation [15], we say that a stratifiable program Π and a set of facts 𝒟 entail a fact P(𝝉)@ϱ, written as (Π,𝒟)P(𝝉)@ϱ, if Π,𝒟P(𝝉)@ϱ. In the remainder of the paper, we will assume the stratification of programs (or set of rules) as implicit.

In this context, the query answering or reasoning task is defined as follows: given the pair Q=(Π,𝐴𝑛𝑠), where Π is a set of rules, 𝐴𝑛𝑠 is an n-ary predicate, and the query Q is evaluated over 𝒟, then Q(𝒟) is defined as Q(𝒟)={(t¯,ϱ)𝑑𝑜𝑚(𝒟)n×𝑡𝑖𝑚𝑒(𝒟)(Π,𝒟)𝐴𝑛𝑠(t¯)@ϱ}, where t¯ is a tuple of terms, the domain of 𝒟, denoted 𝑑𝑜𝑚(𝒟), is the set of all constants that appear in the facts of 𝒟, and the set of all the time intervals in 𝒟 is denoted as 𝑡𝑖𝑚𝑒(𝒟). As we shall see in practical cases, the 𝐴𝑛𝑠 predicate of Π will be sometimes called “query predicate” and provided to the reasoning system with specific conventions, which we omit for space reasons, but will render in textual explanations.