DROPS

Volume

LIPIcs, Volume 127

22nd International Conference on Database Theory (ICDT 2019)

ICDT 2019, March 26-28, 2019, Lisbon, Portugal

Editors: Pablo Barcelo and Marco Calautti

Document

DOI: 10.4230/LIPIcs.MFCS.2023.17

Separating Automatic Relations

Authors: Pablo Barceló, Diego Figueira, and Rémi Morvan

Published in: LIPIcs, Volume 272, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023)

Abstract

We study the separability problem for automatic relations (i.e., relations on finite words definable by synchronous automata) in terms of recognizable relations (i.e., finite unions of products of regular languages). This problem takes as input two automatic relations R and R', and asks if there exists a recognizable relation S that contains R and does not intersect R'. We show this problem to be undecidable when the number of products allowed in the recognizable relation is fixed. In particular, checking if there exists a recognizable relation S with at most k products of regular languages that separates R from R' is undecidable, for each fixed k ⩾ 2. Our proofs reveal tight connections, of independent interest, between the separability problem and the finite coloring problem for automatic graphs, where colors are regular languages.

Cite as

Pablo Barceló, Diego Figueira, and Rémi Morvan. Separating Automatic Relations. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 272, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.MFCS.2023.17,
  author =	{Barcel\'{o}, Pablo and Figueira, Diego and Morvan, R\'{e}mi},
  title =	{{Separating Automatic Relations}},
  booktitle =	{48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-292-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{272},
  editor =	{Leroux, J\'{e}r\^{o}me and Lombardy, Sylvain and Peleg, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2023.17},
  URN =		{urn:nbn:de:0030-drops-185514},
  doi =		{10.4230/LIPIcs.MFCS.2023.17},
  annote =	{Keywords: Automatic relations, recognizable relations, separability, finite colorability}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2023.15

Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries

Authors: Diego Figueira and Rémi Morvan

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)

Abstract

We show that the problem of whether a query is equivalent to a query of tree-width k is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barceló, Romero, and Vardi [Pablo Barceló et al., 2016] has shown decidability for the case k = 1, and here we show that decidability in fact holds for any arbitrary k > 1. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form a^* or (a_1 + ... + a_n) we show that the complexity of the problem drops to Π^p_2. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number k, builds the maximal under-approximation of tree-width k of a UC2RPQ. The maximal under-approximation of tree-width k of a query q is a query q' of tree-width k which is contained in q in a maximal and unique way, that is, such that for every query q'' of tree-width k, if q'' is contained in q then q'' is also contained in q'.

Cite as

Diego Figueira and Rémi Morvan. Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{figueira_et_al:LIPIcs.ICDT.2023.15,
  author =	{Figueira, Diego and Morvan, R\'{e}mi},
  title =	{{Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.15},
  URN =		{urn:nbn:de:0030-drops-177575},
  doi =		{10.4230/LIPIcs.ICDT.2023.15},
  annote =	{Keywords: graph databases, conjunctive regular path queries, semantic optimization, tree-width, containment, approximation}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2021.1

Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk)

Authors: Pablo Barceló

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.

Cite as

Pablo Barceló. Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk). In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{barcelo:LIPIcs.ICDT.2021.1,
  author =	{Barcel\'{o}, Pablo},
  title =	{{Explainability Queries for ML Models and its Connections with Data Management Problems}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.1},
  URN =		{urn:nbn:de:0030-drops-137091},
  doi =		{10.4230/LIPIcs.ICDT.2021.1},
  annote =	{Keywords: ML models, Explainability, Shapley values, decision trees, OBDDs, deterministic and decomposable Boolean circuits}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2020.6

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

Authors: Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)

Abstract

We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.

Cite as

Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux. On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2020.6,
  author =	{Barcel\'{o}, Pablo and Higuera, Nelson and P\'{e}rez, Jorge and Subercaseaux, Bernardo},
  title =	{{On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.6},
  URN =		{urn:nbn:de:0030-drops-119305},
  doi =		{10.4230/LIPIcs.ICDT.2020.6},
  annote =	{Keywords: languages for linear and relational algebra, expressive power, first order logic with aggregation, matrix convolution, matrix inverse, query genericity, locality of queries, safety}
}

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2020.6,
  author =	{Barcel\'{o}, Pablo and Higuera, Nelson and P\'{e}rez, Jorge and Subercaseaux, Bernardo},
  title =	{{On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.6},
  URN =		{urn:nbn:de:0030-drops-119305},
  doi =		{10.4230/LIPIcs.ICDT.2020.6},
  annote =	{Keywords: languages for linear and relational algebra, expressive power, first order logic with aggregation, matrix convolution, matrix inverse, query genericity, locality of queries, safety}
}

Document

Track B: Automata, Logic, Semantics, and Theory of Programming

DOI: 10.4230/LIPIcs.ICALP.2019.103

Monadic Decomposability of Regular Relations (Track B: Automata, Logic, Semantics, and Theory of Programming)

Authors: Pablo Barceló, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, and Reino Niskanen

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Abstract

Monadic decomposibility - the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas - is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it is recognizable, i.e., it has a monadic decomposition (that is, a representation as a boolean combination of cartesian products of regular languages). Regular relations are expressive formalisms which, using an appropriate string encoding, can capture relations definable in Presburger Arithmetic. In fact, their expressive power coincide with relations definable in a universal automatic structure; equivalently, those definable by finite set interpretations in WS1S (Weak Second Order Theory of One Successor). Determining whether a regular relation admits a recognizable relation was known to be decidable (and in exponential time for binary relations), but its precise complexity still hitherto remains open. Our main contribution is to fully settle the complexity of this decision problem by developing new techniques employing infinite Ramsey theory. The complexity for DFA (resp. NFA) representations of regular relations is shown to be NLOGSPACE-complete (resp. PSPACE-complete).

Cite as

Pablo Barceló, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, and Reino Niskanen. Monadic Decomposability of Regular Relations (Track B: Automata, Logic, Semantics, and Theory of Programming). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 103:1-103:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICALP.2019.103,
  author =	{Barcel\'{o}, Pablo and Hong, Chih-Duo and Le, Xuan-Bach and Lin, Anthony W. and Niskanen, Reino},
  title =	{{Monadic Decomposability of Regular Relations}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{103:1--103:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.103},
  URN =		{urn:nbn:de:0030-drops-106790},
  doi =		{10.4230/LIPIcs.ICALP.2019.103},
  annote =	{Keywords: Transducers, Automata, Synchronized Rational Relations, Ramsey Theory, Variable Independence, Automatic Structures}
}

@InProceedings{barcelo_et_al:LIPIcs.ICALP.2019.103,
  author =	{Barcel\'{o}, Pablo and Hong, Chih-Duo and Le, Xuan-Bach and Lin, Anthony W. and Niskanen, Reino},
  title =	{{Monadic Decomposability of Regular Relations}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{103:1--103:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.103},
  URN =		{urn:nbn:de:0030-drops-106790},
  doi =		{10.4230/LIPIcs.ICALP.2019.103},
  annote =	{Keywords: Transducers, Automata, Synchronized Rational Relations, Ramsey Theory, Variable Independence, Automatic Structures}
}

Document

Track B: Automata, Logic, Semantics, and Theory of Programming

DOI: 10.4230/LIPIcs.ICALP.2019.104

Boundedness of Conjunctive Regular Path Queries (Track B: Automata, Logic, Semantics, and Theory of Programming)

Authors: Pablo Barceló, Diego Figueira, and Miguel Romero

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Abstract

We study the boundedness problem for unions of conjunctive regular path queries with inverses (UC2RPQs). This is the problem of, given a UC2RPQ, checking whether it is equivalent to a union of conjunctive queries (UCQ). We show the problem to be ExpSpace-complete, thus coinciding with the complexity of containment for UC2RPQs. As a corollary, when a UC2RPQ is bounded, it is equivalent to a UCQ of at most triple-exponential size, and in fact we show that this bound is optimal. We also study better behaved classes of UC2RPQs, namely acyclic UC2RPQs of bounded thickness, and strongly connected UCRPQs, whose boundedness problem is, respectively, PSpace-complete and Pi_2^P-complete. Most upper bounds exploit results on limitedness for distance automata, in particular extending the model with alternation and two-wayness, which may be of independent interest.

Cite as

Pablo Barceló, Diego Figueira, and Miguel Romero. Boundedness of Conjunctive Regular Path Queries (Track B: Automata, Logic, Semantics, and Theory of Programming). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 104:1-104:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICALP.2019.104,
  author =	{Barcel\'{o}, Pablo and Figueira, Diego and Romero, Miguel},
  title =	{{Boundedness of Conjunctive Regular Path Queries}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{104:1--104:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.104},
  URN =		{urn:nbn:de:0030-drops-106803},
  doi =		{10.4230/LIPIcs.ICALP.2019.104},
  annote =	{Keywords: regular path queries, boundedness, limitedness, distance automata}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.ICDT.2019

LIPIcs, Volume 127, ICDT'19, Complete Volume

Authors: Pablo Barcelo and Marco Calautti

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

LIPIcs, Volume 127, ICDT'19, Complete Volume

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Proceedings{barcelo_et_al:LIPIcs.ICDT.2019,
  title =	{{LIPIcs, Volume 127, ICDT'19, Complete Volume}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019},
  URN =		{urn:nbn:de:0030-drops-103630},
  doi =		{10.4230/LIPIcs.ICDT.2019},
  annote =	{Keywords: Computing Methodologies, Knowledge Representation and Reasoning, Theory of computation, Data modeling, Incomplete, inconsistent and uncertain database Information systems, Data management systems, Data streams, Database query processing, Incomplete data, Inconsistent data, Relational database model}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.ICDT.2019.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Pablo Barcelo and Marco Calautti

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2019.0,
  author =	{Barcelo, Pablo and Calautti, Marco},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.0},
  URN =		{urn:nbn:de:0030-drops-103020},
  doi =		{10.4230/LIPIcs.ICDT.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2019.1

Learning Models over Relational Databases (Invited Talk)

Authors: Dan Olteanu

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. The performance of this database-centric approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from database systems techniques such as factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. Recent results show that this translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack. While these initial results are promising, there is much more awaiting to be discovered.

Cite as

Dan Olteanu. Learning Models over Relational Databases (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{olteanu:LIPIcs.ICDT.2019.1,
  author =	{Olteanu, Dan},
  title =	{{Learning Models over Relational Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.1},
  URN =		{urn:nbn:de:0030-drops-103034},
  doi =		{10.4230/LIPIcs.ICDT.2019.1},
  annote =	{Keywords: In-database analytics, Data complexity, Feature extraction queries, Database dependencies, Model reparameterization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2019.2

The Power of Relational Learning (Invited Talk)

Authors: Lise Getoor

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

We live in a richly interconnected world and, not surprisingly, we generate richly interconnected data. From smart cities to social media to financial networks to biological networks, data is relational. While database theory is built on strong relational foundations, the same is not true for machine learning. The majority of machine learning methods flatten data into a single table before performing any processing. Further, database theory is also built on a bedrock of declarative representations. The same is not true for machine learning, in particular deep learning, which results in black-box, uninterpretable and unexplainable models. In this talk, I will introduce the field of statistical relational learning, an alternative machine learning approach based on declarative relational representations paired with probabilistic models. I’ll describe our work on probabilistic soft logic, a probabilistic programming language that is ideally suited to richly connected, noisy data. Our recent results show that by building on state-of-the-art optimization methods in a distributed implementation, we can solve very large relational learning problems orders of magnitude faster than existing approaches.

Cite as

Lise Getoor. The Power of Relational Learning (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{getoor:LIPIcs.ICDT.2019.2,
  author =	{Getoor, Lise},
  title =	{{The Power of Relational Learning}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.2},
  URN =		{urn:nbn:de:0030-drops-103048},
  doi =		{10.4230/LIPIcs.ICDT.2019.2},
  annote =	{Keywords: Machine learning, Probabilistic soft logic, Relational model}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2019.3

The Power of the Terminating Chase (Invited Talk)

Authors: Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

The chase has become a staple of modern database theory with applications in data integration, query optimisation, data exchange, ontology-based query answering, and many other areas. Most application scenarios and implementations require the chase to terminate and produce a finite universal model, and a large arsenal of sufficient termination criteria is available to guarantee this (generally undecidable) condition. In this invited tutorial, we therefore ask about the expressive power of logical theories for which the chase terminates. Specifically, which database properties can be recognised by such theories, i.e., which Boolean queries can they realise? For the skolem (semi-oblivious) chase, and almost any known termination criterion, this expressivity is just that of plain Datalog. Surprisingly, this limitation of most prior research does not apply to the chase in general. Indeed, we show that standard - chase terminating theories can realise queries with data complexities ranging from PTime to non-elementary that are out of reach for the terminating skolem chase. A "Datalog-first" standard chase that prioritises applications of rules without existential quantifiers makes modelling simpler - and we conjecture: computationally more efficient. This is one of the many open questions raised by our insights, and we conclude with an outlook on the research opportunities in this area.

Cite as

Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph. The Power of the Terminating Chase (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 3:1-3:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{krotzsch_et_al:LIPIcs.ICDT.2019.3,
  author =	{Kr\"{o}tzsch, Markus and Marx, Maximilian and Rudolph, Sebastian},
  title =	{{The Power of the Terminating Chase}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{3:1--3:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.3},
  URN =		{urn:nbn:de:0030-drops-103057},
  doi =		{10.4230/LIPIcs.ICDT.2019.3},
  annote =	{Keywords: Existential rules, Tuple-generating dependencies, all-instances chase termination, expressive power, data complexity}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.4

Counting Triangles under Updates in Worst-Case Optimal Time

Authors: Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time maintenance under updates, which is suboptimal. Our approach can also count all triangles in a static database in the worst-case optimal time needed for enumerating them.

Cite as

Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Counting Triangles under Updates in Worst-Case Optimal Time. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2019.4,
  author =	{Kara, Ahmet and Ngo, Hung Q. and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Counting Triangles under Updates in Worst-Case Optimal Time}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.4},
  URN =		{urn:nbn:de:0030-drops-103068},
  doi =		{10.4230/LIPIcs.ICDT.2019.4},
  annote =	{Keywords: incremental view maintenance, amortized analysis, data skew}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.5

A Formal Framework for Complex Event Processing

Authors: Alejandro Grez, Cristian Riveros, and Martín Ugarte

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.

Cite as

Alejandro Grez, Cristian Riveros, and Martín Ugarte. A Formal Framework for Complex Event Processing. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2019.5,
  author =	{Grez, Alejandro and Riveros, Cristian and Ugarte, Mart{\'\i}n},
  title =	{{A Formal Framework for Complex Event Processing}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.5},
  URN =		{urn:nbn:de:0030-drops-103079},
  doi =		{10.4230/LIPIcs.ICDT.2019.5},
  annote =	{Keywords: Complex event processing, streaming evaluation, constant delay enumeration}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.6

A Formal Framework for Probabilistic Unclean Databases

Authors: Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples.

Cite as

Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{desa_et_al:LIPIcs.ICDT.2019.6,
  author =	{De Sa, Christopher and Ilyas, Ihab F. and Kimelfeld, Benny and R\'{e}, Christopher and Rekatsinas, Theodoros},
  title =	{{A Formal Framework for Probabilistic Unclean Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{6:1--6:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.6},
  URN =		{urn:nbn:de:0030-drops-103083},
  doi =		{10.4230/LIPIcs.ICDT.2019.6},
  annote =	{Keywords: Unclean databases, data cleaning, probabilistic databases, noisy channel}
}

38 Search Results for "Barceló, Pablo"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message