eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
0
0
10.4230/LIPIcs.ICDT.2015
article
LIPIcs, Volume 31, ICDT'15, Complete Volume
Arenas, Marcelo
Ugarte, Martín
LIPIcs, Volume 31, ICDT'15, Complete Volume
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015/LIPIcs.ICDT.2015.pdf
Database Management, Normal forms, Schema and subschema, Query languages, Query processing, Relational databases, Distributed databases, Heterogeneous Databases, Online Information Services, Miscellaneous – Privacy, Office Automation: Workflow management, Performance Analysis and Design Aids: Formal
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
i
xvi
10.4230/LIPIcs.ICDT.2015.i
article
Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors
Arenas, Marcelo
Ugarte, Martín
Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.i/LIPIcs.ICDT.2015.i.pdf
Title
Table of Contents
Preface
ICDT 2015 Test of Time Award
Organization
External Reviewers
List of Authors
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
1
12
10.4230/LIPIcs.ICDT.2015.1
article
The Confounding Problem of Private Data Release (Invited Talk)
Cormode, Graham
The demands to make data available are growing ever louder, including open data initiatives and "data monetization". But the problem of doing so without disclosing confidential information is a subtle and difficult one. Is "private data release" an oxymoron? This paper (accompanying an invited talk) aims to delve into the motivations of data release, explore the challenges, and outline some of the current statistical approaches developed in response to this confounding problem.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.1/LIPIcs.ICDT.2015.1.pdf
privacy
anonymization
data release
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
13
14
10.4230/LIPIcs.ICDT.2015.13
article
Using Locality for Efficient Query Evaluation in Various Computation Models (Invited Talk)
Schweikardt, Nicole
In the database theory and logic literature, different notions of locality of queries have been studied, the most prominent being Hanf locality and Gaifman locality. These notions are designed so that, in order to evaluate a local query in a given database, it suffices to look only at small neighbourhoods around tuples of elements that belong to the database.
In this talk I want to give a survey of how to use locality for efficient query evaluation in various computation models. In particular, we will take a closer look at how to enumerate query results with constant delay, and at how to evaluate queries in a map-reduce like setting [Neven et al., ICDT 2015] or in Pregel [Malewicz et al., SIGMOD 2010]. Also, we will have a closer look at how to transform a given local query into a form suitable for exploiting its locality.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.13/LIPIcs.ICDT.2015.13.pdf
query evaluation
locality
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
15
24
10.4230/LIPIcs.ICDT.2015.15
article
Large-Scale Similarity Joins With Guarantees (Invited Talk)
Pagh, Rasmus
The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.15/LIPIcs.ICDT.2015.15.pdf
Similarity join
filtering
locality-sensitive hashing
recall
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
25
43
10.4230/LIPIcs.ICDT.2015.25
article
A Declarative Framework for Linking Entities
Burdick, Douglas
Fagin, Ronald
Kolaitis, Phokion G.
Popa, Lucian
Tan, Wang-Chiew
The aim of this paper is to introduce and develop a truly declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on the systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons as to why two entities should be linked. We also consider extensions of the core language that capture collective entity resolution, by allowing inter-dependence between links.
We identify a class of "good" solutions for entity linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions, and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language, but may become intractable once we allow inter-dependence between link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.25/LIPIcs.ICDT.2015.25.pdf
entity linking
entity resolution
constraints
certain links
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
44
59
10.4230/LIPIcs.ICDT.2015.44
article
Asymptotic Determinacy of Path Queries using Union-of-Paths Views
Francis, Nadime
We consider the view determinacy problem over graph databases for queries defined as (possibly infinite) unions of path queries. These queries select pairs of nodes in a graph that are connected through a path whose length falls in a given set. A view specification is a set of such queries. We say that a view specification V determines a query Q if, for all databases D, the answers to V on D contain enough information to answer Q.
Our main result states that, given a view V, there exists an explicit bound that depends on V such that we can decide the determinacy problem for all queries that ask for a path longer than this bound, and provide first-order rewritings for the queries that are determined. We call this notion asymptotic determinacy. As a corollary, we can also compute the set of almost all path queries that are determined by V.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.44/LIPIcs.ICDT.2015.44.pdf
Graph databases
Views
Determinacy
Rewriting
Path queries
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
60
75
10.4230/LIPIcs.ICDT.2015.60
article
Games for Active XML Revisited
Schuster, Martin
Schwentick, Thomas
The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The paper examines several extensions to active context-free games.
The primary extension allows more expressive schemas (namely XML schemas and regular nested word languages) for both target and replacement languages and has the effect that games are played on nested words instead of (flat) words as in previous studies. Other extensions consider validation of input parameters of web services, and an alternative semantics based on insertion of service call results.
In general, the complexity of the safe rewriting problem is highly intractable (doubly exponential time), but the paper identifies interesting tractable cases.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.60/LIPIcs.ICDT.2015.60.pdf
Active XML
Computational Complexity
Nested Words
Rewriting Games
Semistructured Data
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
76
93
10.4230/LIPIcs.ICDT.2015.76
article
Answering Conjunctive Queries with Inequalities
Koutris, Paraschos
Milo, Tova
Roy, Sudeepa
Suciu, Dan
In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.76/LIPIcs.ICDT.2015.76.pdf
query evaluation
conjunctive query
inequality
treewidth
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
94
109
10.4230/LIPIcs.ICDT.2015.94
article
SQL's Three-Valued Logic and Certain Answers
Libkin, Leonid
SQL uses three-valued logic for evaluating queries on databases with nulls. The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. While these two cannot coincide, due to a significant complexity mismatch, we can still ask whether the two schemes are related in any way. For instance, does SQL always produce answers we can be certain about?
This is not so: SQL's and certain answers semantics could be totally unrelated. We show, however, that a slight modification of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or non-certain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. We also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.94/LIPIcs.ICDT.2015.94.pdf
Null values
incomplete information
query evaluation
three-valued logic
certain answers
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
110
126
10.4230/LIPIcs.ICDT.2015.110
article
A Trichotomy in the Complexity of Counting Answers to Conjunctive Queries
Chen, Hubie
Mengel, Stefan
Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We present a trichotomy theorem, which shows essentially that this problem on a set of conjunctive queries is either tractable, equivalent to the parameterized CLIQUE problem, or as hard as the parameterized counting CLIQUE problem; the criteria describing which of these situations occurs is simply stated, in terms of graph-theoretic conditions.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.110/LIPIcs.ICDT.2015.110.pdf
database theory
query answering
conjunctive queries
counting complexity
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
127
143
10.4230/LIPIcs.ICDT.2015.127
article
Learning Tree Patterns from Example Graphs
Cohen, Sara
Weiss, Yaacov Y.
This paper investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and, furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively.
This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial (This is non-trivial as satisfying patterns may be exponential in size). Finally, the minimal learning problem, i.e., that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.127/LIPIcs.ICDT.2015.127.pdf
tree patterns
learning
examples
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
144
160
10.4230/LIPIcs.ICDT.2015.144
article
Characterizing XML Twig Queries with Examples
Staworko, Slawek
Wieczorek, Piotr
Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.
We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.144/LIPIcs.ICDT.2015.144.pdf
Query characterization
Query examples
Query fitting
Twig queries
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
161
176
10.4230/LIPIcs.ICDT.2015.161
article
The Product Homomorphism Problem and Applications
ten Cate, Balder
Dalmau, Victor
The product homomorphism problem (PHP) takes as input a finite collection of structures A_1, ..., A_n and a structure B, and asks if there is a homomorphism from the direct product between A_1, A_2, ..., and A_n, to B. We pinpoint the computational complexity of this problem. Our motivation stems from the fact that PHP naturally arises in different areas of database theory. In particular, it is equivalent to the problem of determining whether a relation is definable by a conjunctive query, and the existence of a schema mapping that fits a given collection of positive and negative data examples. We apply our results to obtain complexity bounds for these problems.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.161/LIPIcs.ICDT.2015.161.pdf
Homomorphisms
Direct Product
Data Examples
Definability
Conjunctive Queries
Schema Mappings
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
177
194
10.4230/LIPIcs.ICDT.2015.177
article
Regular Queries on Graph Databases
Reutter, Juan L.
Romero, Miguel
Vardi, Moshe Y.
Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths satisfying certain restrictions. This focus has driven the definition of several different query languages and the subsequent study of their fundamental properties.
We define the graph query language of Regular Queries, which is a natural extension of unions of conjunctive 2-way regular path queries (UC2RPQs) and unions of conjunctive nested 2-way regular path queries (UCN2RPQs). Regular queries allow expressing complex regular patterns between nodes. We formalize regular queries as nonrecursive Datalog programs with transitive closure rules. This language has been previously considered, but its algorithmic properties are not well understood.
Our main contribution is to show elementary tight bounds for the containment problem for regular queries. Specifically, we show that this problem is 2EXPSPACE-complete. For all extensions of regular queries known to date, the containment problem turns out to be non-elementary. Together with the fact that evaluating regular queries is not harder than evaluating UCN2RPQs, our results show that regular queries achieve a good balance between expressiveness and complexity, and constitute a well-behaved class that deserves further investigation.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.177/LIPIcs.ICDT.2015.177.pdf
graph databases
conjunctive regular path queries
regular queries
containment.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
195
211
10.4230/LIPIcs.ICDT.2015.195
article
Complexity and Expressiveness of ShEx for RDF
Staworko, Slawek
Boneva, Iovka
Labra Gayo, Jose E.
Hym, Samuel
Prud'hommeaux, Eric G.
Solbrig, Harold
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. A ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.195/LIPIcs.ICDT.2015.195.pdf
RDF
Schema
Graph topology
Validation
Complexity
Expressiveness
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
212
229
10.4230/LIPIcs.ICDT.2015.212
article
CONSTRUCT Queries in SPARQL
Kostylev, Egor V.
Reutter, Juan L.
Ugarte, Martín
SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output.
In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.212/LIPIcs.ICDT.2015.212.pdf
RDF
SPARQL
Query Languages
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
230
246
10.4230/LIPIcs.ICDT.2015.230
article
Separability by Short Subsequences and Subwords
Hofman, Piotr
Martens, Wim
The separability problem for regular languages asks, given two regular languages I and E, whether there exists a language S that separates the two, that is, includes I but contains nothing from E. Typically, S comes from a simple, less expressive class of languages than I and E. In general, a simple separator $S$ can be seen as an approximation of I or as an explanation of how I and E are different. In a database context, separators can be used for explaining the result of regular path queries or for finding explanations for the difference between paths in a graph database, that is, how paths from given nodes u_1 to v_1 are different from those from u_2 to v_2. We study the complexity of separability of regular languages by combinations of subsequences or subwords of a given length k. The rationale is that the parameter k can be used to influence the size and simplicity of the separator. The emphasis of our study is on tracing the tractability of the problem.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.230/LIPIcs.ICDT.2015.230.pdf
separability
complexity
graph data
debugging
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
247
264
10.4230/LIPIcs.ICDT.2015.247
article
Process-Centric Views of Data-Driven Business Artifacts
Koutsos, Adrien
Vianu, Victor
Declarative, data-aware workflow models are becoming increasingly pervasive. While these have numerous benefits, classical process-centric specifications retain certain advantages. Workflow designers are used to development tools such as BPMN or UML diagrams, that focus on control flow. Views describing valid sequences of tasks are also useful to provide stake-holders with high-level descriptions of the workflow, stripped of the accompanying data. In this paper we study the problem of recovering process-centric views from declarative, data-aware workflow specifications in a variant of IBM's business artifact model. We focus on the simplest and most natural process-centric views, specified by finite-state transition systems, and describing regular languages. The results characterize when process-centric views of artifact systems are regular, using both linear and branching-time semantics. We also study the impact of data dependencies on regularity of the views.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.247/LIPIcs.ICDT.2015.247.pdf
Workflows
data-aware
process-centric
views
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
265
276
10.4230/LIPIcs.ICDT.2015.265
article
On The I/O Complexity of Dynamic Distinct Counting
Hu, Xiaocheng
Tao, Yufei
Yang, Yi
Zhang, Shengyu
Zhou, Shuigeng
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size.
In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.265/LIPIcs.ICDT.2015.265.pdf
distinct counting
lower bound
external memory
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
277
290
10.4230/LIPIcs.ICDT.2015.277
article
Shared-Constraint Range Reporting
Biswas, Sudip
Patil, Manish
Shah, Rahul
Thankachan, Sharma V.
Orthogonal range reporting is one of the classic and most fundamental data structure problems. (2,1,1) query is a 3 dimensional query with two-sided constraint on the first dimension and one sided constraint on each of the 2nd and 3rd dimension. Given a set of N points in three dimension, a particular formulation of such a (2,1,1) query (known as four-sided range reporting in three-dimension) asks to report all those K points within a query region [a, b]X(-infinity, c]X[d, infinity). These queries have overall 4 constraints. In Word-RAM model, the best known structure capable of answering such queries with optimal query time takes O(N log^{epsilon} N) space, where epsilon>0 is any positive constant. It has been shown that any external memory structure in optimal I/Os must use Omega(N log N/ log log_B N) space (in words), where B is the block size [Arge et al., PODS 1999]. In this paper, we study a special type of (2,1,1) queries, where the query parameters a and c are the same i.e., a=c. Even though the query is still four-sided, the number of independent constraints is only three. In other words, one constraint is shared. We call this as a Shared-Constraint Range Reporting (SCRR) problem. We study this problem in both internal as well as external memory models. In RAM model where coordinates can only be compared, we achieve linear-space and O(log N+K) query time solution, matching the best-known three dimensional dominance query bound. Whereas in external memory, we present a linear space structure with O(log_B N + log log N + K/B) query I/Os. We also present an I/O-optimal (i.e., O(log_B N+K/B) I/Os) data structure which occupies O(N log log N)-word space. We achieve these results by employing a novel divide and conquer approach. SCRR finds application in database queries containing sharing among the constraints. We also show that SCRR queries naturally arise in many well known problems such as top-k color reporting, range skyline reporting and ranked document retrieval.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.277/LIPIcs.ICDT.2015.277.pdf
data structure
shared constraint
multi-slab
point partitioning
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
291
307
10.4230/LIPIcs.ICDT.2015.291
article
Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data
Ketsman, Bas
Neven, Frank
In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.291/LIPIcs.ICDT.2015.291.pdf
Coordination-free evaluation
conjunctive queries
broadcasting
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
308
323
10.4230/LIPIcs.ICDT.2015.308
article
Datalog Queries Distributing over Components
Ameloot, Tom J.
Ketsman, Bas
Neven, Frank
Zinn, Daniel
We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog with Negation (the fragment of Datalog with Negation where all rules are connected) provides an effective syntax for Datalog with Negation programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work [Zinn, Green, and Ludäscher, ICDT2012], namely, that the classic win-move query is in F_2 (a particular class of coordination-free queries).
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.308/LIPIcs.ICDT.2015.308.pdf
Datalog
stratified semantics
well-founded semantics
coordination-free evaluation
distributed databases
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
324
341
10.4230/LIPIcs.ICDT.2015.324
article
Distributed Streaming with Finite Memory
Neven, Frank
Schweikardt, Nicole
Servais, Frédéric
Tan, Tony
We introduce three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the key-value paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets. We obtain the following results: (1) DSAs can evaluate first-order queries over bounded degree databases; (2) DSTs can evaluate semijoin algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.324/LIPIcs.ICDT.2015.324.pdf
distributed systems
relational algebra
semijoin algebra
register automata
register transducers.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
342
362
10.4230/LIPIcs.ICDT.2015.342
article
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
Salimi, Babak
Bertossi, Leopoldo
In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problem of determining actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition-inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.342/LIPIcs.ICDT.2015.342.pdf
causality,diagnosis,repairs,consistent query answering,integrity constraints
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
363
379
10.4230/LIPIcs.ICDT.2015.363
article
On the Relationship between Consistent Query Answering and Constraint Satisfaction Problems
Lutz, Carsten
Wolter, Frank
Recently, Fontaine has pointed out a connection between consistent query answering (CQA) and constraint satisfaction problems (CSP) [Fontaine, LICS 2013]. We investigate this connection more closely, identifying classes of CQA problems based on denial constraints and GAV constraints that correspond exactly to CSPs in the sense that a complexity classification of the CQA problems in each class is equivalent (up to FO-reductions) to classifying the complexity of all CSPs. We obtain these classes by admitting only monadic relations and only a single variable in denial constraints/GAVs and restricting queries to hypertree UCQs. We also observe that dropping the requirement of UCQs to be hypertrees corresponds to transitioning from CSP to its logical generalization MMSNP and identify a further relaxation that corresponds to transitioning from MMSNP to GMSNP (also know as MMSNP_2). Moreover, we use the CSP connection to carry over decidability of FO-rewritability and Datalog-rewritability to some of the identified classes of CQA problems.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.363/LIPIcs.ICDT.2015.363.pdf
Consistent Query Answering
Constraint Satisfaction
Data Complexity
Dichotomies
Rewritability
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2015-03-19
31
380
397
10.4230/LIPIcs.ICDT.2015.380
article
On the Data Complexity of Consistent Query Answering over Graph Databases
Barceló, Pablo
Fontaine, Gaëlle
Areas in which graph databases are applied - such as the semantic web, social networks and scientific databases - are prone to inconsistency, mainly due to interoperability issues. This raises the need for understanding query answering over inconsistent graph databases in a framework that is simple yet general enough to accommodate many of its applications. We follow the well-known approach of consistent query answering (CQA), and study the data complexity of CQA over graph databases for regular path queries (RPQs) and regular path constraints (RPCs), which are frequently used. We concentrate on subset, superset and symmetric difference repairs. Without further restrictions, CQA is undecidable for the semantics based on superset and symmetric difference repairs, and Pi_2^P-complete for subset repairs. However, we provide several tractable restrictions on both RPCs and the structure of graph databases that lead to decidability, and even tractability of CQA. We also compare our results with those obtained for CQA in the context of relational databases.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.380/LIPIcs.ICDT.2015.380.pdf
graph databases
regular path queries
consistent query answering
description logics
rewrite systems