The Temporal Vadalog System
Abstract
The recent resurgence of the Datalog language in the Knowledge Representation and Reasoning community has paved the way for a very promising proposal for temporal extension. DatalogMTL (Datalog with Metric Temporal Operators) is a language that offers a good trade-off between computational complexity and expressive power. However, existing implementations are still preliminary or prototypical. In this extended abstract, we give a brief overview of Temporal Vadalog, a system supporting reasoning over DatalogMTL programs built upon an engineered architecture and adopted in production scenarios in the financial setting.
Keywords and phrases:
temporal reasoning, Datalog, DatalogMTLCategory:
Short PaperCopyright and License:
2012 ACM Subject Classification:
Theory of computation Automated reasoning ; Information systems Database management system enginesRelated Version:
This is an extended abstract of a paper previously published on Theory and Practice of Logic Programming.Funding:
This work has been supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT2201, 10.47379/VRG18013, 10.47379/NXT22018]; and by the Austrian Science Fund (FWF) 10.55776/COE12.Editors:
Thierry Vidal and Przemysław Andrzej WałęgaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The adoption of Datalog-based systems for Knowledge Representation and Reasoning (KRR) and their growing application in production settings such as the financial space [1] are going hand in hand with a wave of research into extensions to Datalog, known as the Datalog± family [10]. The aim is to strike a good balance between computational complexity and high expressivity, that is, to incorporate the features needed for real-world applications. In this context, a key requirement in KRR is native support for time through the reasoning process.
The recent introduction of the extension of Datalog with metric temporal operators, named DatalogMTL [8], along with the good computational characteristics of its fragments [16, 17, 18], brought the potential of temporal reasoning to real-world applications, from transport and robotics, from healthcare to finance. However, the integration of such capabilities into a production-ready system requires functional and architectural characteristics that ensure rigorous and efficient implementation of the language in computation and memory footprints. To our knowledge, all currently existing DatalogMTL implementations remain at a prototypical stage or are designed primarily for exploratory research [7, 19].
Contribution.
Overview.
Due to space constraints, the remainder of the paper provides an example-based glimpse into the functional and non-functional approaches to implementing DatalogMTL in Vadalog, specifically in Section 2, focussing on joins, and in Section 3, which describes the architecture. Some preliminaries about DatalogMTL can be found in Appendix A. For a complete overview, the reader is referred to the full paper [4].
2 Time-Aware Joins and Time Series Operators by Example
To briefly illustrate the capabilities of the system, we proceed by example with a realistic – albeit simplified – case from the financial domain.
Example 1.
A Financial Intelligence Unit (FIU) is an agency responsible for collecting information about suspicious financial activity to support investigations into money laundering or terrorism financing. They aim to identify companies with abnormal spending behavior and investigate the corporate networks they belong to. We describe the scenario by a database of temporally annotated facts and the following set of DatalogMTL rules.
| (1) | ||||
| (2) | ||||
| (3) |
We assume that the reader is familiar with the logic-based formulation of Datalog syntax, with the body of a rule (left-hand side) being the implication premise, a logical conjunction of predicate over terms (i.e., constants or variables), while the head (right-hand side) is the implication conclusion. Ignoring the temporal angle, Rule 1 describes the conditions (company C involved in suspiciousActivity while not being markedAsSafe) under which a company C is marked as directSuspicious, while Rules 2-3 recursively mark every directSuspicious company C as suspicious (Rule 2) and then proceed to mark every other company o that is linked to suspicious company C as suspicious as well (Rule 3). Coming back to Rule 1, we see how the temporal aspect is essential: a company that is markedAsSafe in a distant past is not thereby exempt from having its current suspiciousActivity scrutinized. Metric temporal operators are helpful here: assuming day granularity, when prefixed with , the atom suspiciousActivity only holds if the atom itself has continuously held in the interval if evaluated at – the suspiciousActivity continuously held for the previous 3 days – while with , markedAsSafe only holds if the clearance occurred sometime between . Let us consider database :
By applying Rule 1, we see that only company B would be marked as directSuspicious as it was not markedAsSafe in the 7 days prior to the suspiciousActivity. However, to compute the conjunction between these two temporal atoms – and – one would need a time-aware join, able to join facts and their temporal intervals at the same time, while also handling stratified negation.
| Program: | EMA | |
|---|---|---|
| Dataset | Vadalog | InfluxDB |
| NQ1 | 0.0953 | 13.989 |
| NQ10 | 4.8127 | 34.57 |
| NQ50 | 74.1943 | 591.554 |
| NQ100 | 306.6180 | 820.992 |
Temporal Join Algorithm
Input: predicates and to be joined, with not negated
In Temporal Vadalog, the join algorithm is a temporal interval extension of the slot machine join of the Vadalog system [2], which is based on the index nested loop join [11]. In particular, during the first full scan of the inner relation, an in-memory index is constructed, unlike in the usual algorithms, where a precomputed index is typically present in materialized form.
The algorithm, fully reported in Algorithm 1, intuitively works as follows. Assume we have two predicates to be joined and ; first, we use the index to retrieve the next already-scanned fact that matches the join term(s) from (Line 8). If the index does not contain such a fact, we pursue the full scan until either a matching fact is found or all facts have been examined (Lines 10-14). If no further fact is found (Lines 15-22), we continue the scan with the next (if it exists and if is not negated). In case a fact is found, independently of whether it is from the index or the full scan, we produce the valid interval of the joined fact using the join logic (Line 23): difference for a negated literal, intersection for a positive interval, or a blend of interval operations and set operations for temporal operators like (since) and (Until). In the end, we check whether the resulting interval is valid, and if not, if is negated, we continue the scan with the next (if it exists) (Lines 25-29), otherwise we proceed with the loop; if the interval is non-empty and is not negated, we return true as we have found a valid joined fact (Lines 31-32); otherwise, we continue retrieving the next “negated” fact. A visual representation of the execution of a temporal join is shown in Figure 1.
Time Series Operators
In Example 1, we assumed that the suspiciousActivity facts were provided by . Now we want to consider the predicate as a result of a different reasoning task.
Example 2.
Understanding whether a spending activity is suspicious implies detecting anomalous patterns, a task extremely relevant for financial authorities. A form of behavioural analysis is adopted here: if the number of flagged transactions is greater than the company average for a given period, then the spending activity is suspicious.
| (4) |
The average value is given by the sma predicate (simple moving average), which encapsulates a time series operator that performs a moving average calculation to smooth the signal and filter out noise and transient variations. While time series analysis typically adopts ad-hoc software libraries, Temporal Vadalog intrinsically offers such statistics:
| (5) | ||||
| (6) | ||||
| (7) |
We use to extend the validity of the window over n days (Rule 5), and we join it with the original time series to pin it to the correct starting date (Rule 6). As in SMA all data points have equal weight, Rule 7 computes the mathematical average over every window. Performance has been evaluated against a time series database (TSDB) in the full paper [4], in this case with the exponential moving average (EMA), on the NASDAQ Composite Index time series [14]. Results are shown in Figure 2.
3 The Temporal Vadalog Architecture
The temporal join is only one component of the system, and several others are required in order to support query answering over a set of rules like that of Examples 1-2, among which the transformations from the temporal operators and termination of recursive rules. Looking at the larger picture, the Temporal Vadalog architecture extends the volcano iterator model [13] of the Vadalog system [6] with time-awareness. A DatalogMTL program is transformed into an execution pipeline that reads data from sources, applies the transformations (both algebraic and time-related ones) and returns the intended output as a result. The process consists of two stages: in the first, the pipeline is built through a sequence of compilers and optimizers that gradually transforms the set of rules into a reasoning query plan. Taking inspiration from the pipe and filters architecture [9], each required transformation is represented by a filter, while dependencies between rules are represented by pipes. The second stage is at runtime, where a pull-based approach is used. Starting from the output filter, next() calls propagate through filter chains to source filters. Each filter applies the requested transformation based on the rule it represents. As long as data are available in the filter cascade, next() succeeds.
A number of time-relevant operations are tackled along the way: (a) the transformation of time intervals through the application of temporal operators; (b) the implementation of merging operations for intervals through various strategies to ensure correctness and efficiency [3]; (c) the temporal joins in the presence of stratified negation; (d) detection of repeating temporal patterns through the so-called termination strategies, i.e. techniques to guaranteee termination; (e) temporal aggregations [5]; and (f) the possibility to switch between temporal and to non-temporal reasoning, to activate non-temporal features, essential in some reasoning settings, such as existential quantification.
Summary.
In this work, we showed the Temporal Vadalog system by first describing temporal joins and operators applied to an example, and concluded by briefly discussing the “big picture” of its architecture. The interested reader is referred to the full paper.
References
- [1] Teodoro Baldazzi, Luigi Bellomarini, and Emanuel Sallinger. Reasoning over financial scenarios with the Vadalog system. In EDBT, pages 782–791. OpenProceedings.org, 2023. doi:10.48786/edbt.2023.66.
- [2] Luigi Bellomarini, Davide Benedetto, Georg Gottlob, and Emanuel Sallinger. Vadalog: A modern architecture for automated reasoning with large knowledge graphs. IS, 105:101528, 2022. doi:10.1016/j.is.2020.101528.
- [3] Luigi Bellomarini, Livia Blasi, Markus Nissl, and Emanuel Sallinger. The Temporal Vadalog system. In RuleML+RR, volume 13752, pages 130–145. Springer, 2022. doi:10.1007/978-3-031-21541-4_9.
- [4] Luigi Bellomarini, Livia Blasi, Markus Nissl, and Emanuel Sallinger. The Temporal Vadalog system: Temporal Datalog-based reasoning. Theory and Practice of Logic Programming, pages 1–29, 2025. doi:10.1017/S1471068425000018.
- [5] Luigi Bellomarini, Markus Nissl, and Emanuel Sallinger. Monotonic Aggregation for Temporal Datalog. In Proceedings of the 15th International Rule Challenge, volume 2956, 2021. URL: https://ceur-ws.org/Vol-2956/paper30.pdf.
- [6] Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. The Vadalog System: Datalog-based reasoning for knowledge graphs. PVLDB, 11(9):975–987, 2018. doi:10.14778/3213880.3213888.
- [7] Sebastian Brandt, Elem Güzel Kalayci, Roman Kontchakov, Vladislav Ryzhikov, Guohui Xiao, and Michael Zakharyaschev. Ontology-based data access with a Horn fragment of metric temporal logic. In AAAI, pages 1070–76. AAAI Press, 2017. doi:10.1609/aaai.v31i1.10696.
- [8] Sebastian Brandt, Elem Güzel Kalayci, Vladislav Ryzhikov, Guohui Xiao, and Michael Zakharyaschev. Querying log data with metric temporal logic. J. Artif. Intell. Res., 62:829–877, 2018. doi:10.1613/jair.1.11229.
- [9] Frank Buschmann, Kevlin Henney, and Douglas C. Schmidt. Pattern-oriented software architecture, 4th Edition. Wiley, 2007. URL: https://www.worldcat.org/oclc/314792015.
- [10] Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-based framework for tractable query answering over ontologies. J. Web Semant., 14:57–83, 2012. doi:10.1016/j.websem.2012.03.001.
- [11] Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database systems - the complete book (2. ed.). Pearson Education, 2009.
- [12] Allen Van Gelder, Kenneth A. Ross, and John S. Schlipf. The well-founded semantics for general logic programs. J. ACM, 38(3):620–650, 1991. doi:10.1145/116825.116838.
- [13] Goetz Graefe and William J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209–218, 1993. doi:10.1109/ICDE.1993.344061.
- [14] NASDAQ OMX Group. NASDAQ composite index. http://tinyurl.com/frednq, 2023. FRED, Federal Reserve Bank of St. Louis, Accessed: 2023-03-22.
- [15] David J. Tena Cucala, Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, and Egor V. Kostylev. Stratified negation in Datalog with metric temporal operators. In AAAI, pages 6488–6495, 2021. doi:10.1609/aaai.v35i7.16804.
- [16] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. DatalogMTL: Computational complexity and expressive power. In IJCAI, pages 1886–1892, 2019. doi:10.24963/ijcai.2019/261.
- [17] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. DatalogMTL over the integer timeline. In KR, pages 768–77, 2020. doi:10.24963/kr.2020/79.
- [18] Przemyslaw Andrzej Walega, Bernardo Cuenca Grau, Mark Kaminski, and Egor V. Kostylev. Tractable fragments of Datalog with metric temporal operators. In IJCAI, pages 1919–1925, 2020. doi:10.24963/ijcai.2020/266.
- [19] Dingmin Wang, Pan Hu, Przemyslaw Andrzej Walega, and Bernardo Cuenca Grau. MeTeoR: Practical reasoning in Datalog with metric temporal operators. In AAAI, pages 5906–5913, 2022. doi:10.1609/aaai.v36i5.20535.
The appendix includes excerpts from the full version of the paper [4] to cover the more technical parts of this extended abstract. In particular, Appendix A provides the background of DatalogMTL.
Appendix A DatalogMTL
DatalogMTL is Datalog extended with operators from the metric temporal logic. We provide a summary of DatalogMTL with stratified negation under continuous semantics. DatalogMTL is defined over the rational timeline, i.e., an ordered set of rational numbers . An interval is a non-empty subset of such that for each where , , and the endpoints . The brackets denote whether the interval is closed (“”), half-open (“”,“”) or open (“”), whereas angle brackets (“”) are used when unspecified. An interval is punctual if it is of the form , positive if , and bounded if .
DatalogMTL extends the syntax of Datalog with negation with temporal operators [15]. For the following definitions, we consider a function-free first-order signature. An atom is of the form , where is a -ary predicate and is a -ary tuple of terms, where a term is either a constant or a variable. An atom is ground if it contains no variables. A fact is an expression , where is an interval and a ground atom and a database is a set of facts. A literal is an expression given by the following grammar, where is a positive interval: . A rule is an expression given by the following grammar, where , each () is a literal and is an atom: . The conjunction of literals is the rule body, where denote positive literals and denote negated (i.e., prefixed with not) literals. The atom is the rule head. A rule is safe if each variable occurs in at least one positive body literal, positive if it has no negated body literals (i.e., ), and ground if it contains no variables. A program is a set of safe rules and is stratifiable if there exists a stratification of a program . A stratification of is defined as a function that maps each predicate in to a positive integer (stratum) s.t. for each rule, where denotes a predicate of the head, and (resp. ) a positive (negative) body predicate, and . The semantics of DatalogMTL is given by an interpretation that specifies for each time point and each ground atom , whether is satisfied at , in which case we write . This satisfiability notion extends to ground literals as follows:
An interpretation satisfies (), if , a fact , if for all , and a set of facts if it is a model of each fact in . Furthermore, satisfies a ground rule if for and for for every ; for every , if the literals in the body are satisfied, so is the head ; satisfies a rule when it satisfies every possible grounding of the rule. Moreover, is a model of a program if it satisfies every rule in the program and the program has a stratification, i.e., it is stratifiable. Given a stratifiable program and a set of facts , we call the canonical model of and [8], and define it as the minimum model of and , i.e., is the minimum model for all the facts of and the rules of . In this context, “minimum” means that the set of positive literals in is minimized or, equivalently, that the positive literals of this model are contained in every other model. Since is stratifiable, this minimum model exists and is unique [12]. According to Tena Cucala’s notation [15], we say that a stratifiable program and a set of facts entail a fact , written as , if . In the remainder of the paper, we will assume the stratification of programs (or set of rules) as implicit.
In this context, the query answering or reasoning task is defined as follows: given the pair , where is a set of rules, is an -ary predicate, and the query is evaluated over , then is defined as , where is a tuple of terms, the domain of , denoted , is the set of all constants that appear in the facts of , and the set of all the time intervals in is denoted as . As we shall see in practical cases, the predicate of will be sometimes called “query predicate” and provided to the reasoning system with specific conventions, which we omit for space reasons, but will render in textual explanations.
