Volume

LIPIcs, Volume 48

19th International Conference on Database Theory (ICDT 2016)



Thumbnail PDF

Event

ICDT 2016, March 15-18, 2016, Bordeaux, France

Editors

Wim Martens
Thomas Zeume

Publication Details

  • published at: 2016-03-14
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-002-6
  • DBLP: db/conf/icdt/icdt2016

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 48, ICDT'16, Complete Volume

Authors: Wim Martens and Thomas Zeume


Abstract
LIPIcs, Volume 48, ICDT'16, Complete Volume

Cite as

19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@Proceedings{martens_et_al:LIPIcs.ICDT.2016,
  title =	{{LIPIcs, Volume 48, ICDT'16, Complete Volume}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016},
  URN =		{urn:nbn:de:0030-drops-57991},
  doi =		{10.4230/LIPIcs.ICDT.2016},
  annote =	{Keywords: Database Management, Normal forms, Schema and subschema, Query languages, Query processing, Relational databases, Distributed databases, Heterogeneous Databases, Online Information Services,Miscellaneous – Privacy, Office Automation: Workflow management}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers, List of Authors

Authors: Wim Martens and Thomas Zeume


Abstract
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers, List of Authors

Cite as

19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{martens_et_al:LIPIcs.ICDT.2016.0,
  author =	{Martens, Wim and Zeume, Thomas},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers, List of Authors}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.0},
  URN =		{urn:nbn:de:0030-drops-57940},
  doi =		{10.4230/LIPIcs.ICDT.2016.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers, List of Authors}
}
Document
The ICDT 2016 Test of Time Award Announcement

Authors: Foto N. Afrati, Claire David, and Georg Gottlob


Abstract
We describe the 2016 ICDT Test of Time Award which is awarded to Chandra Chekuri and Anand Rajaraman for their 1997 ICDT paper on "Conjunctive Query Containment Revisited".

Cite as

Foto N. Afrati, Claire David, and Georg Gottlob. The ICDT 2016 Test of Time Award Announcement. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{afrati_et_al:LIPIcs.ICDT.2016.1,
  author =	{Afrati, Foto N. and David, Claire and Gottlob, Georg},
  title =	{{The ICDT 2016 Test of Time Award Announcement}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.1},
  URN =		{urn:nbn:de:0030-drops-57938},
  doi =		{10.4230/LIPIcs.ICDT.2016.1},
  annote =	{Keywords: conjunctive query, treewidth, NP-hardness, rewriting}
}
Document
Invited Talk
Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk)

Authors: Floris Geerts


Abstract
Large datasets introduce challenges to the scalability of query answering. Given a query Q and a dataset D, it is often prohibitively costly to compute the query answers Q(D) when D is big. To this end, one may want to use heuristics, "quick and dirty" algorithms which return approximate answers. However, in many applications it is a must to find exact query answers. So, how can we efficiently compute Q(D) when D is big or when we only have limited resources? One idea is to find a small subset D_Q of D such that Q(D_Q)=Q(D) where the size of D_Q is independent of the size of the underlying dataset D. Intuitively, when such a D_Q can be found for a query Q, the query is said to be scale independent (Armbrust et al. 2011, Armbrust et al. 2013, Fan et al. 2014). Indeed, for answering such queries the size of the underlying database does not matter, i.e., query processing is independent of the scale of the database. In this talk, I will survey various formalisms that enable large classes of queries to be scale independent. These formalisms primarily rely on the availability of access constraints, a combination of indexes and cardinality constraints, on the data (Fan et al. 15, Fan et al. 14). We will take a closer look at how, in the presence of such constraints, queries can often be compiled into efficient query plans that access a bounded amount data (Cao et al. 2014, Fan et al. 2015), and how these techniques relate to query processing in the presence of access patterns (Benedikt et al. 2015, Benedikt et al. 2014, Deutsch et al. 2007). Finally, we illustrate that scale independent queries are quite common in practice and that they indeed can be efficiently answered on big datasets when access constraints are present (Cao et al. 2015, Cao et al. 2014).

Cite as

Floris Geerts. Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk). In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{geerts:LIPIcs.ICDT.2016.2,
  author =	{Geerts, Floris},
  title =	{{Scale Independence: Using Small Data to Answer Queries on Big Data}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{2:1--2:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.2},
  URN =		{urn:nbn:de:0030-drops-57715},
  doi =		{10.4230/LIPIcs.ICDT.2016.2},
  annote =	{Keywords: Scale independence, Access constraints, Query processing}
}
Document
Invited Talk
Top-k Indexes Made Small and Sweet (Invited Talk)

Authors: Yufei Tao


Abstract
Top-k queries have become extremely popular in the database community. Such a query, which is issued on a set of elements each carrying a real-valued weight, returns the k elements with the highest weights among all the elements that satisfy a predicate. As usual, an index structure is necessary to answer a query substantially faster than accessing the whole input set. The existing research on top-k queries can be classified in two categories. The first one, which is system-oriented, aims to devise indexes that are simple to understand and easy to implement. These indexes, typically designed with heuristics, are reasonably fast in practical applications, but do not necessarily offer strong performance guarantees - in other words, they are small but not sweet. The other category, which is theory-oriented, aims to develop indexes that promise attractive bounds on the space consumption and query overhead (sometimes also update cost). These indexes, unfortunately, are often excessively sophisticated in the adopted techniques, and are rarely applied in practice - they are sweet but not small. This talk will discuss the progress of an on-going project that strives to take down the barrier between the two categories, by crafting a framework for acquiring simple top-k indexes with excellent performance guarantees - namely, small and sweet. This is achieved with reductions that produce top-k indexes automatically from the existing data structures for conventional reporting queries on unweighted elements (i.e., finding all elements satisfying a predicate), and/or the existing data structures on top-1 queries. Our reductions promise nearly no performance deterioration with respect to those existing structures, are general enough to be applicable to a huge variety of top-k problems, and work in both the external memory model and the RAM model.

Cite as

Yufei Tao. Top-k Indexes Made Small and Sweet (Invited Talk). In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{tao:LIPIcs.ICDT.2016.3,
  author =	{Tao, Yufei},
  title =	{{Top-k Indexes Made Small and Sweet}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.3},
  URN =		{urn:nbn:de:0030-drops-57725},
  doi =		{10.4230/LIPIcs.ICDT.2016.3},
  annote =	{Keywords: Data Structures, Top-k, External Memory, RAM, Reductions}
}
Document
Invited Talk
New Algorithms for Heavy Hitters in Data Streams (Invited Talk)

Authors: David P. Woodruff


Abstract
An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the l_1-heavy hitters and l_2-heavy hitters. There are a number of algorithmic solutions for these problems, starting with the work of Misra and Gries, as well as the CountMin and CountSketch data structures, among others. In this paper (accompanying an invited talk) we cover several recent results developed in this area, which improve upon the classical solutions to these problems. In particular, we develop new algorithms for finding l_1-heavy hitters and l_2-heavy hitters, with significantly less memory required than what was known, and which are optimal in a number of parameter regimes.

Cite as

David P. Woodruff. New Algorithms for Heavy Hitters in Data Streams (Invited Talk). In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{woodruff:LIPIcs.ICDT.2016.4,
  author =	{Woodruff, David P.},
  title =	{{New Algorithms for Heavy Hitters in Data Streams}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{4:1--4:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.4},
  URN =		{urn:nbn:de:0030-drops-57739},
  doi =		{10.4230/LIPIcs.ICDT.2016.4},
  annote =	{Keywords: data streams, heavy hitters}
}
Document
Beyond Well-designed SPARQL

Authors: Mark Kaminski and Egor V. Kostylev


Abstract
SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive - query answering is PSPACE-complete. The well-designed fragment of SPARQL achieves much better computational properties by restricting the use of optional matching - query answering becomes coNP-complete. However, well-designed SPARQL captures far from all real-life queries - in fact, only about half of the queries over DBpedia that use OPTIONAL are well-designed. In the present paper, we study queries outside of well-designed SPARQL. We introduce the class of weakly well-designed queries that subsumes well-designed queries and includes most common meaningful non-well-designed queries: our analysis shows that the new fragment captures about 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly well-designed SPARQL remains coNP-complete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment's expressive power is strictly in-between well-designed and full SPARQL. Finally, we provide an intuitive normal form for weakly well-designed queries and study the complexity of containment and equivalence.

Cite as

Mark Kaminski and Egor V. Kostylev. Beyond Well-designed SPARQL. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kaminski_et_al:LIPIcs.ICDT.2016.5,
  author =	{Kaminski, Mark and Kostylev, Egor V.},
  title =	{{Beyond Well-designed SPARQL}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.5},
  URN =		{urn:nbn:de:0030-drops-57744},
  doi =		{10.4230/LIPIcs.ICDT.2016.5},
  annote =	{Keywords: RDF, Query languages, SPARQL, Optional matching}
}
Document
A Framework for Estimating Stream Expression Cardinalities

Authors: Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and Justin Thaler


Abstract
Given m distributed data streams A_1,..., A_m, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over A_1,..., A_m. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability.

Cite as

Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and Justin Thaler. A Framework for Estimating Stream Expression Cardinalities. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 6:1-6:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{dasgupta_et_al:LIPIcs.ICDT.2016.6,
  author =	{Dasgupta, Anirban and Lang, Kevin J. and Rhodes, Lee and Thaler, Justin},
  title =	{{A Framework for Estimating Stream Expression Cardinalities}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{6:1--6:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.6},
  URN =		{urn:nbn:de:0030-drops-57754},
  doi =		{10.4230/LIPIcs.ICDT.2016.6},
  annote =	{Keywords: sketching, data stream algorithms, mergeability, distinct elements, set operations}
}
Document
Declarative Probabilistic Programming with Datalog

Authors: Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena


Abstract
Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

Cite as

Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{barany_et_al:LIPIcs.ICDT.2016.7,
  author =	{Barany, Vince and ten Cate, Balder and Kimelfeld, Benny and Olteanu, Dan and Vagena, Zografoula},
  title =	{{Declarative Probabilistic Programming with Datalog}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.7},
  URN =		{urn:nbn:de:0030-drops-57761},
  doi =		{10.4230/LIPIcs.ICDT.2016.7},
  annote =	{Keywords: Chase, Datalog, probability measure space, probabilistic programming}
}
Document
Worst-Case Optimal Algorithms for Parallel Query Processing

Authors: Paraschos Koutris, Paul Beame, and Dan Suciu


Abstract
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with p servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of (Ngo et al. 2012) for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query q when all relations have size equal to M is O(M/p^{1/psi^*}), where psi^* is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries.

Cite as

Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2016.8,
  author =	{Koutris, Paraschos and Beame, Paul and Suciu, Dan},
  title =	{{Worst-Case Optimal Algorithms for Parallel Query Processing}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.8},
  URN =		{urn:nbn:de:0030-drops-57771},
  doi =		{10.4230/LIPIcs.ICDT.2016.8},
  annote =	{Keywords: conjunctive query, parallel computation, worst-case bounds}
}
Document
Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation

Authors: Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick


Abstract
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query, also referred to as parallel-correctness. This paper extends the study of the complexity of parallel-correctness and its constituents, parallel-soundness and parallel-completeness, to unions of conjunctive queries with and without negation. As a by-product it is shown that the containment problem for conjunctive queries with negation is coNEXPTIME-complete.

Cite as

Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 9:1-9:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{geck_et_al:LIPIcs.ICDT.2016.9,
  author =	{Geck, Gaetano and Ketsman, Bas and Neven, Frank and Schwentick, Thomas},
  title =	{{Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{9:1--9:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.9},
  URN =		{urn:nbn:de:0030-drops-57787},
  doi =		{10.4230/LIPIcs.ICDT.2016.9},
  annote =	{Keywords: Conjunctive queries, distributed evaluation}
}
Document
A Formal Study of Collaborative Access Control in Distributed Datalog

Authors: Serge Abiteboul, Pierre Bourhis, and Victor Vianu


Abstract
We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for "declassifying" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes.

Cite as

Serge Abiteboul, Pierre Bourhis, and Victor Vianu. A Formal Study of Collaborative Access Control in Distributed Datalog. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{abiteboul_et_al:LIPIcs.ICDT.2016.10,
  author =	{Abiteboul, Serge and Bourhis, Pierre and Vianu, Victor},
  title =	{{A Formal Study of Collaborative Access Control in Distributed Datalog}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.10},
  URN =		{urn:nbn:de:0030-drops-57794},
  doi =		{10.4230/LIPIcs.ICDT.2016.10},
  annote =	{Keywords: Distributed datalog, access control, provenance}
}
Document
It's All a Matter of Degree: Using Degree Information to Optimize Multiway Joins

Authors: Manas R. Joglekar and Christopher M. Ré


Abstract
We optimize multiway equijoins on relational tables using degree information. We give a new bound that uses degree information to more tightly bound the maximum output size of a query. On real data, our bound on the number of triangles in a social network can be up to 95 times tighter than existing worst case bounds. We show that using only a constant amount of degree information, we are able to obtain join algorithms with a running time that has a smaller exponent than existing algorithms - for any database instance. We also show that this degree information can be obtained in nearly linear time, which yields asymptotically faster algorithms in the serial setting and lower communication algorithms in the MapReduce setting. In the serial setting, the data complexity of join processing can be expressed as a function O(IN^x + OUT) in terms of input size IN and output size OUT in which x depends on the query. An upper bound for x is given by fractional hypertreewidth. We are interested in situations in which we can get algorithms for which x is strictly smaller than the fractional hypertreewidth. We say that a join can be processed in subquadratic time if x < 2. Building on the AYZ algorithm for processing cycle joins in quadratic time, for a restricted class of joins which we call 1-series-parallel graphs, we obtain a complete decision procedure for identifying subquadratic solvability (subject to the 3-SUM problem requiring quadratic time). Our 3-SUM based quadratic lower bound is tight, making it the only known tight bound for joins that does not require any assumption about the matrix multiplication exponent omega. We also give a MapReduce algorithm that meets our improved communication bound and handles essentially optimal parallelism.

Cite as

Manas R. Joglekar and Christopher M. Ré. It's All a Matter of Degree: Using Degree Information to Optimize Multiway Joins. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{joglekar_et_al:LIPIcs.ICDT.2016.11,
  author =	{Joglekar, Manas R. and R\'{e}, Christopher M.},
  title =	{{It's All a Matter of Degree: Using Degree Information to Optimize Multiway Joins}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.11},
  URN =		{urn:nbn:de:0030-drops-57800},
  doi =		{10.4230/LIPIcs.ICDT.2016.11},
  annote =	{Keywords: Joins, Degree, MapReduce}
}
Document
Filtering With the Crowd: CrowdScreen Revisited

Authors: Benoit Groz, Ezra Levin, Isaac Meilijson, and Tova Milo


Abstract
Filtering a set of items, based on a set of properties that can be verified by humans, is a common application of CrowdSourcing. When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates. The goal of this paper is thus to devise algorithms that can cope with such situations. To achieve this, we provide new theoretical insights to the problem, then use them to develop a new efficient algorithm. We also propose novel optimizations for the algorithms of CrowdScreen that improve their scalability. We complement our theoretical study by an experimental evaluation of the algorithms on a large set of synthetic parameters as well as real-life crowdsourcing scenarios, demonstrating the advantages of our solution.

Cite as

Benoit Groz, Ezra Levin, Isaac Meilijson, and Tova Milo. Filtering With the Crowd: CrowdScreen Revisited. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{groz_et_al:LIPIcs.ICDT.2016.12,
  author =	{Groz, Benoit and Levin, Ezra and Meilijson, Isaac and Milo, Tova},
  title =	{{Filtering With the Crowd: CrowdScreen Revisited}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.12},
  URN =		{urn:nbn:de:0030-drops-57817},
  doi =		{10.4230/LIPIcs.ICDT.2016.12},
  annote =	{Keywords: CrowdSourcing, filtering, algorithms, sprt, hypothesis testing}
}
Document
Streaming Partitioning of Sequences and Trees

Authors: Christian Konrad


Abstract
We study streaming algorithms for partitioning integer sequences and trees. In the case of trees, we suppose that the input tree is provided by a stream consisting of a depth-first-traversal of the input tree. This captures the problem of partitioning XML streams, among other problems. We show that both problems admit deterministic (1+epsilon)-approximation streaming algorithms, where a single pass is sufficient for integer sequences and two passes are required for trees. The space complexity for partitioning integer sequences is O((1/epsilon) * p * log(nm)) and for partitioning trees is O((1/epsilon) * p^2 * log(nm)), where n is the length of the input stream, m is the maximal weight of an element in the stream, and p is the number of partitions to be created. Furthermore, for the problem of partitioning integer sequences, we show that computing an optimal solution in one pass requires Omega(n) space, and computing a (1+epsilon)-approximation in one pass requires Omega((1/epsilon) * log(n)) space, rendering our algorithm tight for instances with p,m in O(1).

Cite as

Christian Konrad. Streaming Partitioning of Sequences and Trees. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{konrad:LIPIcs.ICDT.2016.13,
  author =	{Konrad, Christian},
  title =	{{Streaming Partitioning of Sequences and Trees}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.13},
  URN =		{urn:nbn:de:0030-drops-57829},
  doi =		{10.4230/LIPIcs.ICDT.2016.13},
  annote =	{Keywords: Streaming Algorithms, XML Documents, Data Partitioning, Communication Complexity}
}
Document
Dynamic Graph Queries

Authors: Pablo Muñoz, Nils Vortmeier, and Thomas Zeume


Abstract
Graph databases in many applications - semantic web, transport or biological networks among others - are not only large, but also frequently modified. Evaluating graph queries in this dynamic context is a challenging task, as those queries often combine first-order and navigational features. Motivated by recent results on maintaining dynamic reachability, we study the dynamic evaluation of traditional query languages for graphs in the descriptive complexity framework. Our focus is on maintaining regular path queries, and extensions thereof, by first-order formulas. In particular we are interested in path queries defined by non-regular languages and in extended conjunctive regular path queries (which allow to compare labels of paths based on word relations). Further we study the closely related problems of maintaining distances in graphs and reachability in product graphs. In this preliminary study we obtain upper bounds for those problems in restricted settings, such as undirected and acyclic graphs, or under insertions only, and negative results regarding quantifier-free update formulas. In addition we point out interesting directions for further research.

Cite as

Pablo Muñoz, Nils Vortmeier, and Thomas Zeume. Dynamic Graph Queries. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{munoz_et_al:LIPIcs.ICDT.2016.14,
  author =	{Mu\~{n}oz, Pablo and Vortmeier, Nils and Zeume, Thomas},
  title =	{{Dynamic Graph Queries}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.14},
  URN =		{urn:nbn:de:0030-drops-57830},
  doi =		{10.4230/LIPIcs.ICDT.2016.14},
  annote =	{Keywords: Dynamic descriptive complexity, graph databases, graph products, reachability, path queries}
}
Document
Verification of Evolving Graph-structured Data under Expressive Path Constraints

Authors: Diego Calvanese, Magdalena Ortiz, and Mantas Šimkus


Abstract
Integrity constraints play a central role in databases and, among other applications, are fundamental for preserving data integrity when databases evolve as a result of operations manipulating the data. In this context, an important task is that of static verification, which consists in deciding whether a given set of constraints is preserved after the execution of a given sequence of operations, for every possible database satisfying the initial constraints. In this paper, we consider constraints over graph-structured data formulated in an expressive Description Logic (DL) that allows for regular expressions over binary relations and their inverses, generalizing many of the well-known path constraint languages proposed for semi-structured data in the last two decades. In this setting, we study the problem of static verification, for operations expressed in a simple yet flexible language built from additions and deletions of complex DL expressions. We establish undecidability of the general setting, and identify suitable restricted fragments for which we obtain tight complexity results, building on techniques developed in our previous work for simpler DLs. As a by-product, we obtain new (un)decidability results for the implication problem of path constraints, and improve previous upper bounds on the complexity of the problem.

Cite as

Diego Calvanese, Magdalena Ortiz, and Mantas Šimkus. Verification of Evolving Graph-structured Data under Expressive Path Constraints. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{calvanese_et_al:LIPIcs.ICDT.2016.15,
  author =	{Calvanese, Diego and Ortiz, Magdalena and \v{S}imkus, Mantas},
  title =	{{Verification of Evolving Graph-structured Data under Expressive Path Constraints}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.15},
  URN =		{urn:nbn:de:0030-drops-57843},
  doi =		{10.4230/LIPIcs.ICDT.2016.15},
  annote =	{Keywords: Path constraints, Description Logics, Graph databases, Static verification}
}
Document
Query Stability in Monotonic Data-Aware Business Processes

Authors: Ognjen Savkovic, Elisa Marengo, and Werner Nutt


Abstract
Organizations continuously accumulate data, often according to some business processes. If one poses a query over such data for decision support, it is important to know whether the query is stable, that is, whether the answers will stay the same or may change in the future because business processes may add further data. We investigate query stability for conjunctive queries. To this end, we define a formalism that combines an explicit representation of the control flow of a process with a specification of how data is read and inserted into the database. We consider different restrictions of the process model and the state of the system, such as negation in conditions, cyclic executions, read access to written data, presence of pending process instances, and the possibility to start fresh process instances. We identify for which restriction combinations stability of conjunctive queries is decidable and provide encodings into variants of Datalog that are optimal with respect to the worst-case complexity of the problem.

Cite as

Ognjen Savkovic, Elisa Marengo, and Werner Nutt. Query Stability in Monotonic Data-Aware Business Processes. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 16:1-16:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{savkovic_et_al:LIPIcs.ICDT.2016.16,
  author =	{Savkovic, Ognjen and Marengo, Elisa and Nutt, Werner},
  title =	{{Query Stability in Monotonic Data-Aware Business Processes}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{16:1--16:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.16},
  URN =		{urn:nbn:de:0030-drops-57851},
  doi =		{10.4230/LIPIcs.ICDT.2016.16},
  annote =	{Keywords: Business Processes, Query Stability}
}
Document
Document Spanners: From Expressive Power to Decision Problems

Authors: Dominik D. Freydenberger and Mario Holldack


Abstract
We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.

Cite as

Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2016.17,
  author =	{Freydenberger, Dominik D. and Holldack, Mario},
  title =	{{Document Spanners: From Expressive Power to Decision Problems}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.17},
  URN =		{urn:nbn:de:0030-drops-57867},
  doi =		{10.4230/LIPIcs.ICDT.2016.17},
  annote =	{Keywords: Information extraction, document spanners, regular expressions, regex, patterns, word equations, decision problems, descriptional complexity}
}
Document
Algorithms for Provisioning Queries and Analytics

Authors: Sepehr Assadi, Sanjeev Khanna, Yang Li, and Val Tannen


Abstract
Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates k hypotheticals, each retaining some of the tuples of a database instance, possibly overlapping, and she wishes to answer the query under scenarios, where a scenario is defined by a subset of the hypotheticals that are "turned on". We say that a query admits compact provisioning if given any database instance and any k hypotheticals, one can create a poly-size (in k) sketch that can then be used to answer the query under any of the 2^k possible scenarios without accessing the original instance. In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and statistics/analytics (the numerical component). We first show that queries that compute quantiles or linear regression (as well as simpler queries that compute count and sum/average of positive values) can be compactly provisioned to provide (multiplicative) approximate answers to an arbitrary precision. In contrast, exact provisioning for each of these statistics requires the sketch size to be exponential in k. We then establish that for any complex query whose logical component is a positive relational algebra query, as long as the numerical component can be compactly provisioned, the complex query itself can be compactly provisioned. On the other hand, introducing negation or recursion in the logical component again requires the sketch size to be exponential in k. While our positive results use algorithms that do not access the original instance after a scenario is known, we prove our lower bounds even for the case when, knowing the scenario, limited access to the instance is allowed.

Cite as

Sepehr Assadi, Sanjeev Khanna, Yang Li, and Val Tannen. Algorithms for Provisioning Queries and Analytics. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ICDT.2016.18,
  author =	{Assadi, Sepehr and Khanna, Sanjeev and Li, Yang and Tannen, Val},
  title =	{{Algorithms for Provisioning Queries and Analytics}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.18},
  URN =		{urn:nbn:de:0030-drops-57877},
  doi =		{10.4230/LIPIcs.ICDT.2016.18},
  annote =	{Keywords: What-if Analysis, Provisioning, Data Compression, Approximate Query Answering}
}
Document
Limits of Schema Mappings

Authors: Phokion G. Kolaitis, Reinhard Pichler, Emanuel Sallinger, and Vadim Savenkov


Abstract
Schema mappings have been extensively studied in the context of data exchange and data integration, where they have turned out to be the right level of abstraction for formalizing data inter-operability tasks. Up to now and for the most part, schema mappings have been studied as static objects, in the sense that each time the focus has been on a single schema mapping of interest or, in the case of composition, on a pair of schema mappings of interest. In this paper, we adopt a dynamic viewpoint and embark on a study of sequences of schema mappings and of the limiting behavior of such sequences. To this effect, we first introduce a natural notion of distance on sets of finite target instances that expresses how "close" two sets of target instances are as regards the certain answers of conjunctive queries on these sets. Using this notion of distance, we investigate pointwise limits and uniform limits of sequences of schema mappings, as well as the companion notions of pointwise Cauchy and uniformly Cauchy sequences of schema mappings. We obtain a number of results about the limits of sequences of GAV schema mappings and the limits of sequences of LAV schema mappings that reveal striking differences between these two classes of schema mappings. We also consider the completion of the metric space of sets of target instances and obtain concrete representations of limits of sequences of schema mappings in terms of generalized schema mappings, i.e., schema mappings with infinite target instances as solutions to (finite) source instances.

Cite as

Phokion G. Kolaitis, Reinhard Pichler, Emanuel Sallinger, and Vadim Savenkov. Limits of Schema Mappings. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 19:1-19:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kolaitis_et_al:LIPIcs.ICDT.2016.19,
  author =	{Kolaitis, Phokion G. and Pichler, Reinhard and Sallinger, Emanuel and Savenkov, Vadim},
  title =	{{Limits of Schema Mappings}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{19:1--19:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.19},
  URN =		{urn:nbn:de:0030-drops-57882},
  doi =		{10.4230/LIPIcs.ICDT.2016.19},
  annote =	{Keywords: Limit, Pointwise convergence, Uniform convergence, Schema mapping}
}
Document
Reasoning About Integrity Constraints for Tree-Structured Data

Authors: Wojciech Czerwinski, Claire David, Filip Murlak, and Pawel Parys


Abstract
We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded.

Cite as

Wojciech Czerwinski, Claire David, Filip Murlak, and Pawel Parys. Reasoning About Integrity Constraints for Tree-Structured Data. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 20:1-20:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{czerwinski_et_al:LIPIcs.ICDT.2016.20,
  author =	{Czerwinski, Wojciech and David, Claire and Murlak, Filip and Parys, Pawel},
  title =	{{Reasoning About Integrity Constraints for Tree-Structured Data}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{20:1--20:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.20},
  URN =		{urn:nbn:de:0030-drops-57897},
  doi =		{10.4230/LIPIcs.ICDT.2016.20},
  annote =	{Keywords: data trees, integrity constraints, unions of conjunctive queries, schema mappings, entailment, containment, consistency}
}
Document
Complexity of Repair Checking and Consistent Query Answering

Authors: Sebastian Arming, Reinhard Pichler, and Emanuel Sallinger


Abstract
Inconsistent databases (i.e., databases violating some given set of integrity constraints) may arise in many applications such as, for instance, data integration. Hence, the handling of inconsistent data has evolved as an active field of research. In this paper, we consider two fundamental problems in this context: Repair Checking (RC) and Consistent Query Answering (CQA). So far, these problems have been mainly studied from the point of view of data complexity (where all parts of the input except for the database are considered as fixed). While for some kinds of integrity constraints, also combined complexity (where all parts of the input are allowed to vary) has been considered, for several other kinds of integrity constraints, combined complexity has been left unexplored. Moreover, a more detailed analysis (keeping other parts of the input fixed - e.g., the constraints only) is completely missing. The goal of our work is a thorough analysis of the complexity of the RC and CQA problems. Our contribution is a complete picture of the complexity of these problems for a wide range of integrity constraints. Our analysis thus allows us to get a better understanding of the true sources of complexity.

Cite as

Sebastian Arming, Reinhard Pichler, and Emanuel Sallinger. Complexity of Repair Checking and Consistent Query Answering. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{arming_et_al:LIPIcs.ICDT.2016.21,
  author =	{Arming, Sebastian and Pichler, Reinhard and Sallinger, Emanuel},
  title =	{{Complexity of Repair Checking and Consistent Query Answering}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.21},
  URN =		{urn:nbn:de:0030-drops-57900},
  doi =		{10.4230/LIPIcs.ICDT.2016.21},
  annote =	{Keywords: inconsistency, consistent query answering, complexity}
}
Document
On the Complexity of Enumerating the Answers to Well-designed Pattern Trees

Authors: Markus Kröll, Reinhard Pichler, and Sebastian Skritek


Abstract
Well-designed pattern trees (wdPTs) have been introduced as an extension of conjunctive queries to allow for partial matching - analogously to the OPTIONAL operator of the semantic web query language SPARQL. Several computational problems of wdPTs have been studied in recent years, such as the evaluation problem in various settings, the counting problem, as well as static analysis tasks including the containment and equivalence problems. Also restrictions needed to achieve tractability of these tasks have been proposed. In contrast, the problem of enumerating the answers to a wdPT has been largely ignored so far. In this work, we embark on a systematic study of the complexity of the enumeration problem of wdPTs. As our main result, we identify several tractable and intractable cases of this problem both from a classical complexity point of view and from a parameterized complexity point of view.

Cite as

Markus Kröll, Reinhard Pichler, and Sebastian Skritek. On the Complexity of Enumerating the Answers to Well-designed Pattern Trees. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kroll_et_al:LIPIcs.ICDT.2016.22,
  author =	{Kr\"{o}ll, Markus and Pichler, Reinhard and Skritek, Sebastian},
  title =	{{On the Complexity of Enumerating the Answers to Well-designed Pattern Trees}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.22},
  URN =		{urn:nbn:de:0030-drops-57912},
  doi =		{10.4230/LIPIcs.ICDT.2016.22},
  annote =	{Keywords: SPARQL, Pattern Trees, CQs, Enumeration, Complexity}
}
Document
A Practically Efficient Algorithm for Generating Answers to Keyword Search Over Data Graphs

Authors: Konstantin Golenberg and Yehoshua Sagiv


Abstract
In keyword search over a data graph, an answer is a non-redundant subtree that contains all the keywords of the query. A naive approach to producing all the answers by increasing height is to generalize Dijkstra's algorithm to enumerating all acyclic paths by increasing weight. The idea of freezing is introduced so that (most) non-shortest paths are generated only if they are actually needed for producing answers. The resulting algorithm for generating subtrees, called GTF, is subtle and its proof of correctness is intricate. Extensive experiments show that GTF outperforms existing systems, even ones that for efficiency's sake are incomplete (i.e., cannot produce all the answers). In particular, GTF is scalable and performs well even on large data graphs and when many answers are neede

Cite as

Konstantin Golenberg and Yehoshua Sagiv. A Practically Efficient Algorithm for Generating Answers to Keyword Search Over Data Graphs. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 23:1-23:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{golenberg_et_al:LIPIcs.ICDT.2016.23,
  author =	{Golenberg, Konstantin and Sagiv, Yehoshua},
  title =	{{A Practically Efficient Algorithm for Generating Answers to Keyword Search Over Data Graphs}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{23:1--23:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.23},
  URN =		{urn:nbn:de:0030-drops-57923},
  doi =		{10.4230/LIPIcs.ICDT.2016.23},
  annote =	{Keywords: Keyword search over data graphs, subtree enumeration by height, top-k answers, efficiency}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail