27 Search Results for "Calautti, Marco"


Volume

LIPIcs, Volume 127

22nd International Conference on Database Theory (ICDT 2019)

ICDT 2019, March 26-28, 2019, Lisbon, Portugal

Editors: Pablo Barcelo and Marco Calautti

Document
Complete Volume
LIPIcs, Volume 127, ICDT'19, Complete Volume

Authors: Pablo Barcelo and Marco Calautti

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
LIPIcs, Volume 127, ICDT'19, Complete Volume

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{barcelo_et_al:LIPIcs.ICDT.2019,
  title =	{{LIPIcs, Volume 127, ICDT'19, Complete Volume}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019},
  URN =		{urn:nbn:de:0030-drops-103630},
  doi =		{10.4230/LIPIcs.ICDT.2019},
  annote =	{Keywords: Computing Methodologies, Knowledge Representation and Reasoning, Theory of computation, Data modeling, Incomplete, inconsistent and uncertain database Information systems, Data management systems, Data streams, Database query processing, Incomplete data, Inconsistent data, Relational database model}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Pablo Barcelo and Marco Calautti

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2019.0,
  author =	{Barcelo, Pablo and Calautti, Marco},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.0},
  URN =		{urn:nbn:de:0030-drops-103020},
  doi =		{10.4230/LIPIcs.ICDT.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Learning Models over Relational Databases (Invited Talk)

Authors: Dan Olteanu

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. The performance of this database-centric approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from database systems techniques such as factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. Recent results show that this translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack. While these initial results are promising, there is much more awaiting to be discovered.

Cite as

Dan Olteanu. Learning Models over Relational Databases (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{olteanu:LIPIcs.ICDT.2019.1,
  author =	{Olteanu, Dan},
  title =	{{Learning Models over Relational Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.1},
  URN =		{urn:nbn:de:0030-drops-103034},
  doi =		{10.4230/LIPIcs.ICDT.2019.1},
  annote =	{Keywords: In-database analytics, Data complexity, Feature extraction queries, Database dependencies, Model reparameterization}
}
Document
Invited Talk
The Power of Relational Learning (Invited Talk)

Authors: Lise Getoor

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
We live in a richly interconnected world and, not surprisingly, we generate richly interconnected data. From smart cities to social media to financial networks to biological networks, data is relational. While database theory is built on strong relational foundations, the same is not true for machine learning. The majority of machine learning methods flatten data into a single table before performing any processing. Further, database theory is also built on a bedrock of declarative representations. The same is not true for machine learning, in particular deep learning, which results in black-box, uninterpretable and unexplainable models. In this talk, I will introduce the field of statistical relational learning, an alternative machine learning approach based on declarative relational representations paired with probabilistic models. I’ll describe our work on probabilistic soft logic, a probabilistic programming language that is ideally suited to richly connected, noisy data. Our recent results show that by building on state-of-the-art optimization methods in a distributed implementation, we can solve very large relational learning problems orders of magnitude faster than existing approaches.

Cite as

Lise Getoor. The Power of Relational Learning (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{getoor:LIPIcs.ICDT.2019.2,
  author =	{Getoor, Lise},
  title =	{{The Power of Relational Learning}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.2},
  URN =		{urn:nbn:de:0030-drops-103048},
  doi =		{10.4230/LIPIcs.ICDT.2019.2},
  annote =	{Keywords: Machine learning, Probabilistic soft logic, Relational model}
}
Document
Invited Talk
The Power of the Terminating Chase (Invited Talk)

Authors: Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
The chase has become a staple of modern database theory with applications in data integration, query optimisation, data exchange, ontology-based query answering, and many other areas. Most application scenarios and implementations require the chase to terminate and produce a finite universal model, and a large arsenal of sufficient termination criteria is available to guarantee this (generally undecidable) condition. In this invited tutorial, we therefore ask about the expressive power of logical theories for which the chase terminates. Specifically, which database properties can be recognised by such theories, i.e., which Boolean queries can they realise? For the skolem (semi-oblivious) chase, and almost any known termination criterion, this expressivity is just that of plain Datalog. Surprisingly, this limitation of most prior research does not apply to the chase in general. Indeed, we show that standard - chase terminating theories can realise queries with data complexities ranging from PTime to non-elementary that are out of reach for the terminating skolem chase. A "Datalog-first" standard chase that prioritises applications of rules without existential quantifiers makes modelling simpler - and we conjecture: computationally more efficient. This is one of the many open questions raised by our insights, and we conclude with an outlook on the research opportunities in this area.

Cite as

Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph. The Power of the Terminating Chase (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 3:1-3:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{krotzsch_et_al:LIPIcs.ICDT.2019.3,
  author =	{Kr\"{o}tzsch, Markus and Marx, Maximilian and Rudolph, Sebastian},
  title =	{{The Power of the Terminating Chase}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{3:1--3:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.3},
  URN =		{urn:nbn:de:0030-drops-103057},
  doi =		{10.4230/LIPIcs.ICDT.2019.3},
  annote =	{Keywords: Existential rules, Tuple-generating dependencies, all-instances chase termination, expressive power, data complexity}
}
Document
Counting Triangles under Updates in Worst-Case Optimal Time

Authors: Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time maintenance under updates, which is suboptimal. Our approach can also count all triangles in a static database in the worst-case optimal time needed for enumerating them.

Cite as

Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Counting Triangles under Updates in Worst-Case Optimal Time. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2019.4,
  author =	{Kara, Ahmet and Ngo, Hung Q. and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Counting Triangles under Updates in Worst-Case Optimal Time}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.4},
  URN =		{urn:nbn:de:0030-drops-103068},
  doi =		{10.4230/LIPIcs.ICDT.2019.4},
  annote =	{Keywords: incremental view maintenance, amortized analysis, data skew}
}
Document
A Formal Framework for Complex Event Processing

Authors: Alejandro Grez, Cristian Riveros, and Martín Ugarte

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.

Cite as

Alejandro Grez, Cristian Riveros, and Martín Ugarte. A Formal Framework for Complex Event Processing. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2019.5,
  author =	{Grez, Alejandro and Riveros, Cristian and Ugarte, Mart{\'\i}n},
  title =	{{A Formal Framework for Complex Event Processing}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.5},
  URN =		{urn:nbn:de:0030-drops-103079},
  doi =		{10.4230/LIPIcs.ICDT.2019.5},
  annote =	{Keywords: Complex event processing, streaming evaluation, constant delay enumeration}
}
Document
A Formal Framework for Probabilistic Unclean Databases

Authors: Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples.

Cite as

Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{desa_et_al:LIPIcs.ICDT.2019.6,
  author =	{De Sa, Christopher and Ilyas, Ihab F. and Kimelfeld, Benny and R\'{e}, Christopher and Rekatsinas, Theodoros},
  title =	{{A Formal Framework for Probabilistic Unclean Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{6:1--6:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.6},
  URN =		{urn:nbn:de:0030-drops-103083},
  doi =		{10.4230/LIPIcs.ICDT.2019.6},
  annote =	{Keywords: Unclean databases, data cleaning, probabilistic databases, noisy channel}
}
Document
On the Expressive Power of Linear Algebra on Graphs

Authors: Floris Geerts

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Most graph query languages are rooted in logic. By contrast, in this paper we consider graph query languages rooted in linear algebra. More specifically, we consider MATLANG, a matrix query language recently introduced, in which some basic linear algebra functionality is supported. We investigate the problem of characterising equivalence of graphs, represented by their adjacency matrices, for various fragments of MATLANG. A complete picture is painted of the impact of the linear algebra operations in MATLANG on their ability to distinguish graphs.

Cite as

Floris Geerts. On the Expressive Power of Linear Algebra on Graphs. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{geerts:LIPIcs.ICDT.2019.7,
  author =	{Geerts, Floris},
  title =	{{On the Expressive Power of Linear Algebra on Graphs}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.7},
  URN =		{urn:nbn:de:0030-drops-103093},
  doi =		{10.4230/LIPIcs.ICDT.2019.7},
  annote =	{Keywords: matrix query languages, graph queries, graph theory, logics}
}
Document
Fragments of Bag Relational Algebra: Expressiveness and Certain Answers

Authors: Marco Console, Paolo Guagliardo, and Leonid Libkin

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra.

Cite as

Marco Console, Paolo Guagliardo, and Leonid Libkin. Fragments of Bag Relational Algebra: Expressiveness and Certain Answers. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{console_et_al:LIPIcs.ICDT.2019.8,
  author =	{Console, Marco and Guagliardo, Paolo and Libkin, Leonid},
  title =	{{Fragments of Bag Relational Algebra: Expressiveness and Certain Answers}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.8},
  URN =		{urn:nbn:de:0030-drops-103106},
  doi =		{10.4230/LIPIcs.ICDT.2019.8},
  annote =	{Keywords: bag semantics, relational algebra, expressivity, certain answers, complexity}
}
Document
Categorical Range Reporting with Frequencies

Authors: Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
In this paper, we consider a variant of the color range reporting problem called color reporting with frequencies. Our goal is to pre-process a set of colored points into a data structure, so that given a query range Q, we can report all colors that appear in Q, along with their respective frequencies. In other words, for each reported color, we also output the number of times it occurs in Q. We describe an external-memory data structure that uses O(N(1+log^2D/log N)) words and answers one-dimensional queries in O(1 +K/B) I/Os, where N is the total number of points in the data structure, D is the total number of colors in the data structure, K is the number of reported colors, and B is the block size. Next we turn to an approximate version of this problem: report all colors sigma that appear in the query range; for every reported color, we provide a constant-factor approximation on its frequency. We consider color reporting with approximate frequencies in two dimensions. Our data structure uses O(N) space and answers two-dimensional queries in O(log_B N +log^*B + K/B) I/Os in the special case when the query range is bounded on two sides. As a corollary, we can also answer one-dimensional approximate queries within the same time and space bounds.

Cite as

Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan. Categorical Range Reporting with Frequencies. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ICDT.2019.9,
  author =	{Ganguly, Arnab and Munro, J. Ian and Nekrich, Yakov and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Categorical Range Reporting with Frequencies}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.9},
  URN =		{urn:nbn:de:0030-drops-103115},
  doi =		{10.4230/LIPIcs.ICDT.2019.9},
  annote =	{Keywords: Data Structures, Range Reporting, Range Counting, Categorical Range Reporting, Orthogonal Range Query}
}
Document
Approximating Distance Measures for the Skyline

Authors: Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.

Cite as

Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk. Approximating Distance Measures for the Skyline. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kumar_et_al:LIPIcs.ICDT.2019.10,
  author =	{Kumar, Nirman and Raichel, Benjamin and Sintos, Stavros and Van Buskirk, Gregory},
  title =	{{Approximating Distance Measures for the Skyline}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.10},
  URN =		{urn:nbn:de:0030-drops-103125},
  doi =		{10.4230/LIPIcs.ICDT.2019.10},
  annote =	{Keywords: Skyline, Pareto optimal, Approximation, Hardness, Multi-criteria decision making}
}
Document
Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees

Authors: Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin’s well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified against the similarity threshold. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.

Cite as

Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou. Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICDT.2019.11,
  author =	{Li, Yuliang and Wang, Jianguo and Pullman, Benjamin and Bandeira, Nuno and Papakonstantinou, Yannis},
  title =	{{Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.11},
  URN =		{urn:nbn:de:0030-drops-103135},
  doi =		{10.4230/LIPIcs.ICDT.2019.11},
  annote =	{Keywords: Vector databases, Similarity search, Cosine, Threshold Algorithm}
}
Document
An Experimental Study of the Treewidth of Real-World Graph Data

Authors: Silviu Maniu, Pierre Senellart, and Suraj Jog

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Treewidth is a parameter that measures how tree-like a relational instance is, and whether it can reasonably be decomposed into a tree. Many computation tasks are known to be tractable on databases of small treewidth, but computing the treewidth of a given instance is intractable. This article is the first large-scale experimental study of treewidth and tree decompositions of real-world database instances (25 datasets from 8 different domains, with sizes ranging from a few thousand to a few million vertices). The goal is to determine which data, if any, can benefit of the wealth of algorithms for databases of small treewidth. For each dataset, we obtain upper and lower bound estimations of their treewidth, and study the properties of their tree decompositions. We show in particular that, even when treewidth is high, using partial tree decompositions can result in data structures that can assist algorithms.

Cite as

Silviu Maniu, Pierre Senellart, and Suraj Jog. An Experimental Study of the Treewidth of Real-World Graph Data. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{maniu_et_al:LIPIcs.ICDT.2019.12,
  author =	{Maniu, Silviu and Senellart, Pierre and Jog, Suraj},
  title =	{{An Experimental Study of the Treewidth of Real-World Graph Data}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.12},
  URN =		{urn:nbn:de:0030-drops-103147},
  doi =		{10.4230/LIPIcs.ICDT.2019.12},
  annote =	{Keywords: Treewidth, Graph decompositions, Experiments, Query processing}
}
  • Refine by Author
  • 3 Calautti, Marco
  • 3 Olteanu, Dan
  • 2 Barcelo, Pablo
  • 2 Kimelfeld, Benny
  • 2 Mengel, Stefan
  • Show More...

  • Refine by Classification
  • 6 Theory of computation → Database query processing and optimization (theory)
  • 4 Information systems → Data streams
  • 4 Information systems → Database query processing
  • 4 Information systems → Relational database model
  • 4 Theory of computation → Complexity theory and logic
  • Show More...

  • Refine by Keyword
  • 4 Datalog
  • 2 Existential rules
  • 2 complexity
  • 2 query evaluation
  • 1 Approximation
  • Show More...

  • Refine by Type
  • 26 document
  • 1 volume

  • Refine by Publication Year
  • 27 2019

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail