LIPIcs, Volume 31, ICDT'15, Complete Volume

eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 0 0 10.4230/LIPIcs.ICDT.2015 article LIPIcs, Volume 31, ICDT'15, Complete Volume Arenas, Marcelo Ugarte, Martín LIPIcs, Volume 31, ICDT'15, Complete Volume https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015/LIPIcs.ICDT.2015.pdf Database Management, Normal forms, Schema and subschema, Query languages, Query processing, Relational databases, Distributed databases, Heterogeneous Databases, Online Information Services, Miscellaneous – Privacy, Office Automation: Workflow management, Performance Analysis and Design Aids: Formal eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 i xvi 10.4230/LIPIcs.ICDT.2015.i article Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors Arenas, Marcelo Ugarte, Martín Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.i/LIPIcs.ICDT.2015.i.pdf Title Table of Contents Preface ICDT 2015 Test of Time Award Organization External Reviewers List of Authors eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 1 12 10.4230/LIPIcs.ICDT.2015.1 article The Confounding Problem of Private Data Release (Invited Talk) Cormode, Graham The demands to make data available are growing ever louder, including open data initiatives and "data monetization". But the problem of doing so without disclosing confidential information is a subtle and difficult one. Is "private data release" an oxymoron? This paper (accompanying an invited talk) aims to delve into the motivations of data release, explore the challenges, and outline some of the current statistical approaches developed in response to this confounding problem. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.1/LIPIcs.ICDT.2015.1.pdf privacy anonymization data release eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 13 14 10.4230/LIPIcs.ICDT.2015.13 article Using Locality for Efficient Query Evaluation in Various Computation Models (Invited Talk) Schweikardt, Nicole In the database theory and logic literature, different notions of locality of queries have been studied, the most prominent being Hanf locality and Gaifman locality. These notions are designed so that, in order to evaluate a local query in a given database, it suffices to look only at small neighbourhoods around tuples of elements that belong to the database. In this talk I want to give a survey of how to use locality for efficient query evaluation in various computation models. In particular, we will take a closer look at how to enumerate query results with constant delay, and at how to evaluate queries in a map-reduce like setting [Neven et al., ICDT 2015] or in Pregel [Malewicz et al., SIGMOD 2010]. Also, we will have a closer look at how to transform a given local query into a form suitable for exploiting its locality. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.13/LIPIcs.ICDT.2015.13.pdf query evaluation locality eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 15 24 10.4230/LIPIcs.ICDT.2015.15 article Large-Scale Similarity Joins With Guarantees (Invited Talk) Pagh, Rasmus The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.15/LIPIcs.ICDT.2015.15.pdf Similarity join filtering locality-sensitive hashing recall eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 25 43 10.4230/LIPIcs.ICDT.2015.25 article A Declarative Framework for Linking Entities Burdick, Douglas Fagin, Ronald Kolaitis, Phokion G. Popa, Lucian Tan, Wang-Chiew The aim of this paper is to introduce and develop a truly declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on the systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons as to why two entities should be linked. We also consider extensions of the core language that capture collective entity resolution, by allowing inter-dependence between links. We identify a class of "good" solutions for entity linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions, and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language, but may become intractable once we allow inter-dependence between link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.25/LIPIcs.ICDT.2015.25.pdf entity linking entity resolution constraints certain links eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 44 59 10.4230/LIPIcs.ICDT.2015.44 article Asymptotic Determinacy of Path Queries using Union-of-Paths Views Francis, Nadime We consider the view determinacy problem over graph databases for queries defined as (possibly infinite) unions of path queries. These queries select pairs of nodes in a graph that are connected through a path whose length falls in a given set. A view specification is a set of such queries. We say that a view specification V determines a query Q if, for all databases D, the answers to V on D contain enough information to answer Q. Our main result states that, given a view V, there exists an explicit bound that depends on V such that we can decide the determinacy problem for all queries that ask for a path longer than this bound, and provide first-order rewritings for the queries that are determined. We call this notion asymptotic determinacy. As a corollary, we can also compute the set of almost all path queries that are determined by V. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.44/LIPIcs.ICDT.2015.44.pdf Graph databases Views Determinacy Rewriting Path queries eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 60 75 10.4230/LIPIcs.ICDT.2015.60 article Games for Active XML Revisited Schuster, Martin Schwentick, Thomas The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The paper examines several extensions to active context-free games. The primary extension allows more expressive schemas (namely XML schemas and regular nested word languages) for both target and replacement languages and has the effect that games are played on nested words instead of (flat) words as in previous studies. Other extensions consider validation of input parameters of web services, and an alternative semantics based on insertion of service call results. In general, the complexity of the safe rewriting problem is highly intractable (doubly exponential time), but the paper identifies interesting tractable cases. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.60/LIPIcs.ICDT.2015.60.pdf Active XML Computational Complexity Nested Words Rewriting Games Semistructured Data eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 76 93 10.4230/LIPIcs.ICDT.2015.76 article Answering Conjunctive Queries with Inequalities Koutris, Paraschos Milo, Tova Roy, Sudeepa Suciu, Dan In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.76/LIPIcs.ICDT.2015.76.pdf query evaluation conjunctive query inequality treewidth eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 94 109 10.4230/LIPIcs.ICDT.2015.94 article SQL's Three-Valued Logic and Certain Answers Libkin, Leonid SQL uses three-valued logic for evaluating queries on databases with nulls. The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. While these two cannot coincide, due to a significant complexity mismatch, we can still ask whether the two schemes are related in any way. For instance, does SQL always produce answers we can be certain about? This is not so: SQL's and certain answers semantics could be totally unrelated. We show, however, that a slight modification of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or non-certain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. We also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.94/LIPIcs.ICDT.2015.94.pdf Null values incomplete information query evaluation three-valued logic certain answers eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 110 126 10.4230/LIPIcs.ICDT.2015.110 article A Trichotomy in the Complexity of Counting Answers to Conjunctive Queries Chen, Hubie Mengel, Stefan Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We present a trichotomy theorem, which shows essentially that this problem on a set of conjunctive queries is either tractable, equivalent to the parameterized CLIQUE problem, or as hard as the parameterized counting CLIQUE problem; the criteria describing which of these situations occurs is simply stated, in terms of graph-theoretic conditions. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.110/LIPIcs.ICDT.2015.110.pdf database theory query answering conjunctive queries counting complexity eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 127 143 10.4230/LIPIcs.ICDT.2015.127 article Learning Tree Patterns from Example Graphs Cohen, Sara Weiss, Yaacov Y. This paper investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and, furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively. This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial (This is non-trivial as satisfying patterns may be exponential in size). Finally, the minimal learning problem, i.e., that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.127/LIPIcs.ICDT.2015.127.pdf tree patterns learning examples eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 144 160 10.4230/LIPIcs.ICDT.2015.144 article Characterizing XML Twig Queries with Examples Staworko, Slawek Wieczorek, Piotr Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with. We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.144/LIPIcs.ICDT.2015.144.pdf Query characterization Query examples Query fitting Twig queries eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 161 176 10.4230/LIPIcs.ICDT.2015.161 article The Product Homomorphism Problem and Applications ten Cate, Balder Dalmau, Victor The product homomorphism problem (PHP) takes as input a finite collection of structures A_1, ..., A_n and a structure B, and asks if there is a homomorphism from the direct product between A_1, A_2, ..., and A_n, to B. We pinpoint the computational complexity of this problem. Our motivation stems from the fact that PHP naturally arises in different areas of database theory. In particular, it is equivalent to the problem of determining whether a relation is definable by a conjunctive query, and the existence of a schema mapping that fits a given collection of positive and negative data examples. We apply our results to obtain complexity bounds for these problems. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.161/LIPIcs.ICDT.2015.161.pdf Homomorphisms Direct Product Data Examples Definability Conjunctive Queries Schema Mappings eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 177 194 10.4230/LIPIcs.ICDT.2015.177 article Regular Queries on Graph Databases Reutter, Juan L. Romero, Miguel Vardi, Moshe Y. Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths satisfying certain restrictions. This focus has driven the definition of several different query languages and the subsequent study of their fundamental properties. We define the graph query language of Regular Queries, which is a natural extension of unions of conjunctive 2-way regular path queries (UC2RPQs) and unions of conjunctive nested 2-way regular path queries (UCN2RPQs). Regular queries allow expressing complex regular patterns between nodes. We formalize regular queries as nonrecursive Datalog programs with transitive closure rules. This language has been previously considered, but its algorithmic properties are not well understood. Our main contribution is to show elementary tight bounds for the containment problem for regular queries. Specifically, we show that this problem is 2EXPSPACE-complete. For all extensions of regular queries known to date, the containment problem turns out to be non-elementary. Together with the fact that evaluating regular queries is not harder than evaluating UCN2RPQs, our results show that regular queries achieve a good balance between expressiveness and complexity, and constitute a well-behaved class that deserves further investigation. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.177/LIPIcs.ICDT.2015.177.pdf graph databases conjunctive regular path queries regular queries containment. eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 195 211 10.4230/LIPIcs.ICDT.2015.195 article Complexity and Expressiveness of ShEx for RDF Staworko, Slawek Boneva, Iovka Labra Gayo, Jose E. Hym, Samuel Prud'hommeaux, Eric G. Solbrig, Harold We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. A ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.195/LIPIcs.ICDT.2015.195.pdf RDF Schema Graph topology Validation Complexity Expressiveness eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 212 229 10.4230/LIPIcs.ICDT.2015.212 article CONSTRUCT Queries in SPARQL Kostylev, Egor V. Reutter, Juan L. Ugarte, Martín SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output. In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.212/LIPIcs.ICDT.2015.212.pdf RDF SPARQL Query Languages eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 230 246 10.4230/LIPIcs.ICDT.2015.230 article Separability by Short Subsequences and Subwords Hofman, Piotr Martens, Wim The separability problem for regular languages asks, given two regular languages I and E, whether there exists a language S that separates the two, that is, includes I but contains nothing from E. Typically, S comes from a simple, less expressive class of languages than I and E. In general, a simple separator $S$ can be seen as an approximation of I or as an explanation of how I and E are different. In a database context, separators can be used for explaining the result of regular path queries or for finding explanations for the difference between paths in a graph database, that is, how paths from given nodes u_1 to v_1 are different from those from u_2 to v_2. We study the complexity of separability of regular languages by combinations of subsequences or subwords of a given length k. The rationale is that the parameter k can be used to influence the size and simplicity of the separator. The emphasis of our study is on tracing the tractability of the problem. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.230/LIPIcs.ICDT.2015.230.pdf separability complexity graph data debugging eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 247 264 10.4230/LIPIcs.ICDT.2015.247 article Process-Centric Views of Data-Driven Business Artifacts Koutsos, Adrien Vianu, Victor Declarative, data-aware workflow models are becoming increasingly pervasive. While these have numerous benefits, classical process-centric specifications retain certain advantages. Workflow designers are used to development tools such as BPMN or UML diagrams, that focus on control flow. Views describing valid sequences of tasks are also useful to provide stake-holders with high-level descriptions of the workflow, stripped of the accompanying data. In this paper we study the problem of recovering process-centric views from declarative, data-aware workflow specifications in a variant of IBM's business artifact model. We focus on the simplest and most natural process-centric views, specified by finite-state transition systems, and describing regular languages. The results characterize when process-centric views of artifact systems are regular, using both linear and branching-time semantics. We also study the impact of data dependencies on regularity of the views. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.247/LIPIcs.ICDT.2015.247.pdf Workflows data-aware process-centric views eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 265 276 10.4230/LIPIcs.ICDT.2015.265 article On The I/O Complexity of Dynamic Distinct Counting Hu, Xiaocheng Tao, Yufei Yang, Yi Zhang, Shengyu Zhou, Shuigeng In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size. In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1). https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.265/LIPIcs.ICDT.2015.265.pdf distinct counting lower bound external memory eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 277 290 10.4230/LIPIcs.ICDT.2015.277 article Shared-Constraint Range Reporting Biswas, Sudip Patil, Manish Shah, Rahul Thankachan, Sharma V. Orthogonal range reporting is one of the classic and most fundamental data structure problems. (2,1,1) query is a 3 dimensional query with two-sided constraint on the first dimension and one sided constraint on each of the 2nd and 3rd dimension. Given a set of N points in three dimension, a particular formulation of such a (2,1,1) query (known as four-sided range reporting in three-dimension) asks to report all those K points within a query region [a, b]X(-infinity, c]X[d, infinity). These queries have overall 4 constraints. In Word-RAM model, the best known structure capable of answering such queries with optimal query time takes O(N log^{epsilon} N) space, where epsilon>0 is any positive constant. It has been shown that any external memory structure in optimal I/Os must use Omega(N log N/ log log_B N) space (in words), where B is the block size [Arge et al., PODS 1999]. In this paper, we study a special type of (2,1,1) queries, where the query parameters a and c are the same i.e., a=c. Even though the query is still four-sided, the number of independent constraints is only three. In other words, one constraint is shared. We call this as a Shared-Constraint Range Reporting (SCRR) problem. We study this problem in both internal as well as external memory models. In RAM model where coordinates can only be compared, we achieve linear-space and O(log N+K) query time solution, matching the best-known three dimensional dominance query bound. Whereas in external memory, we present a linear space structure with O(log_B N + log log N + K/B) query I/Os. We also present an I/O-optimal (i.e., O(log_B N+K/B) I/Os) data structure which occupies O(N log log N)-word space. We achieve these results by employing a novel divide and conquer approach. SCRR finds application in database queries containing sharing among the constraints. We also show that SCRR queries naturally arise in many well known problems such as top-k color reporting, range skyline reporting and ranked document retrieval. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.277/LIPIcs.ICDT.2015.277.pdf data structure shared constraint multi-slab point partitioning eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 291 307 10.4230/LIPIcs.ICDT.2015.291 article Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data Ketsman, Bas Neven, Frank In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.291/LIPIcs.ICDT.2015.291.pdf Coordination-free evaluation conjunctive queries broadcasting eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 308 323 10.4230/LIPIcs.ICDT.2015.308 article Datalog Queries Distributing over Components Ameloot, Tom J. Ketsman, Bas Neven, Frank Zinn, Daniel We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog with Negation (the fragment of Datalog with Negation where all rules are connected) provides an effective syntax for Datalog with Negation programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work [Zinn, Green, and Ludäscher, ICDT2012], namely, that the classic win-move query is in F_2 (a particular class of coordination-free queries). https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.308/LIPIcs.ICDT.2015.308.pdf Datalog stratified semantics well-founded semantics coordination-free evaluation distributed databases eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 324 341 10.4230/LIPIcs.ICDT.2015.324 article Distributed Streaming with Finite Memory Neven, Frank Schweikardt, Nicole Servais, Frédéric Tan, Tony We introduce three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the key-value paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets. We obtain the following results: (1) DSAs can evaluate first-order queries over bounded degree databases; (2) DSTs can evaluate semijoin algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.324/LIPIcs.ICDT.2015.324.pdf distributed systems relational algebra semijoin algebra register automata register transducers. eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 342 362 10.4230/LIPIcs.ICDT.2015.342 article From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back Salimi, Babak Bertossi, Leopoldo In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problem of determining actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition-inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.342/LIPIcs.ICDT.2015.342.pdf causality,diagnosis,repairs,consistent query answering,integrity constraints eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 363 379 10.4230/LIPIcs.ICDT.2015.363 article On the Relationship between Consistent Query Answering and Constraint Satisfaction Problems Lutz, Carsten Wolter, Frank Recently, Fontaine has pointed out a connection between consistent query answering (CQA) and constraint satisfaction problems (CSP) [Fontaine, LICS 2013]. We investigate this connection more closely, identifying classes of CQA problems based on denial constraints and GAV constraints that correspond exactly to CSPs in the sense that a complexity classification of the CQA problems in each class is equivalent (up to FO-reductions) to classifying the complexity of all CSPs. We obtain these classes by admitting only monadic relations and only a single variable in denial constraints/GAVs and restricting queries to hypertree UCQs. We also observe that dropping the requirement of UCQs to be hypertrees corresponds to transitioning from CSP to its logical generalization MMSNP and identify a further relaxation that corresponds to transitioning from MMSNP to GMSNP (also know as MMSNP_2). Moreover, we use the CSP connection to carry over decidability of FO-rewritability and Datalog-rewritability to some of the identified classes of CQA problems. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.363/LIPIcs.ICDT.2015.363.pdf Consistent Query Answering Constraint Satisfaction Data Complexity Dichotomies Rewritability eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2015-03-19 31 380 397 10.4230/LIPIcs.ICDT.2015.380 article On the Data Complexity of Consistent Query Answering over Graph Databases Barceló, Pablo Fontaine, Gaëlle Areas in which graph databases are applied - such as the semantic web, social networks and scientific databases - are prone to inconsistency, mainly due to interoperability issues. This raises the need for understanding query answering over inconsistent graph databases in a framework that is simple yet general enough to accommodate many of its applications. We follow the well-known approach of consistent query answering (CQA), and study the data complexity of CQA over graph databases for regular path queries (RPQs) and regular path constraints (RPCs), which are frequently used. We concentrate on subset, superset and symmetric difference repairs. Without further restrictions, CQA is undecidable for the semantics based on superset and symmetric difference repairs, and Pi_2^P-complete for subset repairs. However, we provide several tractable restrictions on both RPCs and the structure of graph databases that lead to decidability, and even tractability of CQA. We also compare our results with those obtained for CQA in the context of relational databases. https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.380/LIPIcs.ICDT.2015.380.pdf graph databases regular path queries consistent query answering description logics rewrite systems

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015</doi>

<documentType>article</documentType>

<title language="eng">LIPIcs, Volume 31, ICDT'15, Complete Volume</title>

<name>Arenas, Marcelo</name>

</author>

<name>Ugarte, Martín</name>

</author>

</authors>

<abstract language="eng">LIPIcs, Volume 31, ICDT'15, Complete Volume</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015/LIPIcs.ICDT.2015.pdf</fullTextUrl>

<keyword>Database Management, Normal forms, Schema and subschema, Query languages, Query processing, Relational databases, Distributed databases, Heterogeneous Databases, Online Information Services, Miscellaneous – Privacy, Office Automation: Workflow management, Performance Analysis and Design Aids: Formal</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.i</doi>

<documentType>article</documentType>

<title language="eng">Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors</title>

<name>Arenas, Marcelo</name>

</author>

<name>Ugarte, Martín</name>

</author>

</authors>

<abstract language="eng">Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.i/LIPIcs.ICDT.2015.i.pdf</fullTextUrl>

<keyword>Title</keyword>

<keyword>Table of Contents</keyword>

<keyword>Preface</keyword>

<keyword>ICDT 2015 Test of Time Award</keyword>

<keyword>Organization</keyword>

<keyword>External Reviewers</keyword>

<keyword>List of Authors</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.1</doi>

<documentType>article</documentType>

<title language="eng">The Confounding Problem of Private Data Release (Invited Talk)</title>

<name>Cormode, Graham</name>

</author>

</authors>

<abstract language="eng">The demands to make data available are growing ever louder, including open data initiatives and "data monetization". But the problem of doing so without disclosing confidential information is a subtle and difficult one. Is "private data release" an oxymoron? This paper (accompanying an invited talk) aims to delve into the motivations of data release, explore the challenges, and outline some of the current statistical approaches developed in response to this confounding problem.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.1/LIPIcs.ICDT.2015.1.pdf</fullTextUrl>

<keyword>privacy</keyword>

<keyword>anonymization</keyword>

<keyword>data release</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.13</doi>

<documentType>article</documentType>

<title language="eng">Using Locality for Efficient Query Evaluation in Various Computation Models (Invited Talk)</title>

<name>Schweikardt, Nicole</name>

</author>

</authors>

<abstract language="eng">In the database theory and logic literature, different notions of locality of queries have been studied, the most prominent being Hanf locality and Gaifman locality. These notions are designed so that, in order to evaluate a local query in a given database, it suffices to look only at small neighbourhoods around tuples of elements that belong to the database. In this talk I want to give a survey of how to use locality for efficient query evaluation in various computation models. In particular, we will take a closer look at how to enumerate query results with constant delay, and at how to evaluate queries in a map-reduce like setting [Neven et al., ICDT 2015] or in Pregel [Malewicz et al., SIGMOD 2010]. Also, we will have a closer look at how to transform a given local query into a form suitable for exploiting its locality.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.13/LIPIcs.ICDT.2015.13.pdf</fullTextUrl>

<keyword>query evaluation</keyword>

<keyword>locality</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.15</doi>

<documentType>article</documentType>

<title language="eng">Large-Scale Similarity Joins With Guarantees (Invited Talk)</title>

<name>Pagh, Rasmus</name>

</author>

</authors>

<abstract language="eng">The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.15/LIPIcs.ICDT.2015.15.pdf</fullTextUrl>

<keyword>Similarity join</keyword>

<keyword>filtering</keyword>

<keyword>locality-sensitive hashing</keyword>

<keyword>recall</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.25</doi>

<documentType>article</documentType>

<title language="eng">A Declarative Framework for Linking Entities</title>

<name>Burdick, Douglas</name>

</author>

<name>Fagin, Ronald</name>

</author>

<name>Kolaitis, Phokion G.</name>

</author>

<name>Popa, Lucian</name>

</author>

<name>Tan, Wang-Chiew</name>

</author>

</authors>

<abstract language="eng">The aim of this paper is to introduce and develop a truly declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on the systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons as to why two entities should be linked. We also consider extensions of the core language that capture collective entity resolution, by allowing inter-dependence between links. We identify a class of "good" solutions for entity linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions, and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language, but may become intractable once we allow inter-dependence between link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.25/LIPIcs.ICDT.2015.25.pdf</fullTextUrl>

<keyword>entity linking</keyword>

<keyword>entity resolution</keyword>

<keyword>constraints</keyword>

<keyword>certain links</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.44</doi>

<documentType>article</documentType>

<title language="eng">Asymptotic Determinacy of Path Queries using Union-of-Paths Views</title>

<name>Francis, Nadime</name>

</author>

</authors>

<abstract language="eng">We consider the view determinacy problem over graph databases for queries defined as (possibly infinite) unions of path queries. These queries select pairs of nodes in a graph that are connected through a path whose length falls in a given set. A view specification is a set of such queries. We say that a view specification V determines a query Q if, for all databases D, the answers to V on D contain enough information to answer Q. Our main result states that, given a view V, there exists an explicit bound that depends on V such that we can decide the determinacy problem for all queries that ask for a path longer than this bound, and provide first-order rewritings for the queries that are determined. We call this notion asymptotic determinacy. As a corollary, we can also compute the set of almost all path queries that are determined by V.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.44/LIPIcs.ICDT.2015.44.pdf</fullTextUrl>

<keyword>Graph databases</keyword>

<keyword>Views</keyword>

<keyword>Determinacy</keyword>

<keyword>Rewriting</keyword>

<keyword>Path queries</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.60</doi>

<documentType>article</documentType>

<title language="eng">Games for Active XML Revisited</title>

<name>Schuster, Martin</name>

</author>

<name>Schwentick, Thomas</name>

</author>

</authors>

<abstract language="eng">The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The paper examines several extensions to active context-free games. The primary extension allows more expressive schemas (namely XML schemas and regular nested word languages) for both target and replacement languages and has the effect that games are played on nested words instead of (flat) words as in previous studies. Other extensions consider validation of input parameters of web services, and an alternative semantics based on insertion of service call results. In general, the complexity of the safe rewriting problem is highly intractable (doubly exponential time), but the paper identifies interesting tractable cases.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.60/LIPIcs.ICDT.2015.60.pdf</fullTextUrl>

<keyword>Active XML</keyword>

<keyword>Computational Complexity</keyword>

<keyword>Nested Words</keyword>

<keyword>Rewriting Games</keyword>

<keyword>Semistructured Data</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.76</doi>

<documentType>article</documentType>

<title language="eng">Answering Conjunctive Queries with Inequalities</title>

<name>Koutris, Paraschos</name>

</author>

</author>

<name>Roy, Sudeepa</name>

</author>

<name>Suciu, Dan</name>

</author>

</authors>

<abstract language="eng">In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.76/LIPIcs.ICDT.2015.76.pdf</fullTextUrl>

<keyword>query evaluation</keyword>

<keyword>conjunctive query</keyword>

<keyword>inequality</keyword>

<keyword>treewidth</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.94</doi>

<documentType>article</documentType>

<title language="eng">SQL's Three-Valued Logic and Certain Answers</title>

<name>Libkin, Leonid</name>

</author>

</authors>

<abstract language="eng">SQL uses three-valued logic for evaluating queries on databases with nulls. The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. While these two cannot coincide, due to a significant complexity mismatch, we can still ask whether the two schemes are related in any way. For instance, does SQL always produce answers we can be certain about? This is not so: SQL's and certain answers semantics could be totally unrelated. We show, however, that a slight modification of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or non-certain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. We also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.94/LIPIcs.ICDT.2015.94.pdf</fullTextUrl>

<keyword>Null values</keyword>

<keyword>incomplete information</keyword>

<keyword>query evaluation</keyword>

<keyword>three-valued logic</keyword>

<keyword>certain answers</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.110</doi>

<documentType>article</documentType>

<title language="eng">A Trichotomy in the Complexity of Counting Answers to Conjunctive Queries</title>

<name>Chen, Hubie</name>

</author>

<name>Mengel, Stefan</name>

</author>

</authors>

<abstract language="eng">Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We present a trichotomy theorem, which shows essentially that this problem on a set of conjunctive queries is either tractable, equivalent to the parameterized CLIQUE problem, or as hard as the parameterized counting CLIQUE problem; the criteria describing which of these situations occurs is simply stated, in terms of graph-theoretic conditions.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.110/LIPIcs.ICDT.2015.110.pdf</fullTextUrl>

<keyword>database theory</keyword>

<keyword>query answering</keyword>

<keyword>conjunctive queries</keyword>

<keyword>counting complexity</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.127</doi>

<documentType>article</documentType>

<title language="eng">Learning Tree Patterns from Example Graphs</title>

<name>Cohen, Sara</name>

</author>

<name>Weiss, Yaacov Y.</name>

</author>

</authors>

<abstract language="eng">This paper investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and, furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively. This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial (This is non-trivial as satisfying patterns may be exponential in size). Finally, the minimal learning problem, i.e., that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.127/LIPIcs.ICDT.2015.127.pdf</fullTextUrl>

<keyword>tree patterns</keyword>

<keyword>learning</keyword>

<keyword>examples</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.144</doi>

<documentType>article</documentType>

<title language="eng">Characterizing XML Twig Queries with Examples</title>

<name>Staworko, Slawek</name>

</author>

<name>Wieczorek, Piotr</name>

</author>

</authors>

<abstract language="eng">Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with. We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.144/LIPIcs.ICDT.2015.144.pdf</fullTextUrl>

<keyword>Query characterization</keyword>

<keyword>Query examples</keyword>

<keyword>Query fitting</keyword>

<keyword>Twig queries</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.161</doi>

<documentType>article</documentType>

<title language="eng">The Product Homomorphism Problem and Applications</title>

<name>ten Cate, Balder</name>

</author>

<name>Dalmau, Victor</name>

</author>

</authors>

<abstract language="eng">The product homomorphism problem (PHP) takes as input a finite collection of structures A_1, ..., A_n and a structure B, and asks if there is a homomorphism from the direct product between A_1, A_2, ..., and A_n, to B. We pinpoint the computational complexity of this problem. Our motivation stems from the fact that PHP naturally arises in different areas of database theory. In particular, it is equivalent to the problem of determining whether a relation is definable by a conjunctive query, and the existence of a schema mapping that fits a given collection of positive and negative data examples. We apply our results to obtain complexity bounds for these problems.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.161/LIPIcs.ICDT.2015.161.pdf</fullTextUrl>

<keyword>Homomorphisms</keyword>

<keyword>Direct Product</keyword>

<keyword>Data Examples</keyword>

<keyword>Definability</keyword>

<keyword>Conjunctive Queries</keyword>

<keyword>Schema Mappings</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.177</doi>

<documentType>article</documentType>

<title language="eng">Regular Queries on Graph Databases</title>

<name>Reutter, Juan L.</name>

</author>

<name>Romero, Miguel</name>

</author>

<name>Vardi, Moshe Y.</name>

</author>

</authors>

<abstract language="eng">Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths satisfying certain restrictions. This focus has driven the definition of several different query languages and the subsequent study of their fundamental properties. We define the graph query language of Regular Queries, which is a natural extension of unions of conjunctive 2-way regular path queries (UC2RPQs) and unions of conjunctive nested 2-way regular path queries (UCN2RPQs). Regular queries allow expressing complex regular patterns between nodes. We formalize regular queries as nonrecursive Datalog programs with transitive closure rules. This language has been previously considered, but its algorithmic properties are not well understood. Our main contribution is to show elementary tight bounds for the containment problem for regular queries. Specifically, we show that this problem is 2EXPSPACE-complete. For all extensions of regular queries known to date, the containment problem turns out to be non-elementary. Together with the fact that evaluating regular queries is not harder than evaluating UCN2RPQs, our results show that regular queries achieve a good balance between expressiveness and complexity, and constitute a well-behaved class that deserves further investigation.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.177/LIPIcs.ICDT.2015.177.pdf</fullTextUrl>

<keyword>graph databases</keyword>

<keyword>conjunctive regular path queries</keyword>

<keyword>regular queries</keyword>

<keyword>containment.</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.195</doi>

<documentType>article</documentType>

<title language="eng">Complexity and Expressiveness of ShEx for RDF</title>

<name>Staworko, Slawek</name>

</author>

<name>Boneva, Iovka</name>

</author>

<name>Labra Gayo, Jose E.</name>

</author>

<name>Hym, Samuel</name>

</author>

<name>Prud'hommeaux, Eric G.</name>

</author>

<name>Solbrig, Harold</name>

</author>

</authors>

<abstract language="eng">We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. A ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.195/LIPIcs.ICDT.2015.195.pdf</fullTextUrl>

<keyword>Schema</keyword>

<keyword>Graph topology</keyword>

<keyword>Validation</keyword>

<keyword>Complexity</keyword>

<keyword>Expressiveness</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.212</doi>

<documentType>article</documentType>

<title language="eng">CONSTRUCT Queries in SPARQL</title>

<name>Kostylev, Egor V.</name>

</author>

<name>Reutter, Juan L.</name>

</author>

<name>Ugarte, Martín</name>

</author>

</authors>

<abstract language="eng">SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output. In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.212/LIPIcs.ICDT.2015.212.pdf</fullTextUrl>

<keyword>SPARQL</keyword>

<keyword>Query Languages</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.230</doi>

<documentType>article</documentType>

<title language="eng">Separability by Short Subsequences and Subwords</title>

<name>Hofman, Piotr</name>

</author>

<name>Martens, Wim</name>

</author>

</authors>

<abstract language="eng">The separability problem for regular languages asks, given two regular languages I and E, whether there exists a language S that separates the two, that is, includes I but contains nothing from E. Typically, S comes from a simple, less expressive class of languages than I and E. In general, a simple separator $S$ can be seen as an approximation of I or as an explanation of how I and E are different. In a database context, separators can be used for explaining the result of regular path queries or for finding explanations for the difference between paths in a graph database, that is, how paths from given nodes u_1 to v_1 are different from those from u_2 to v_2. We study the complexity of separability of regular languages by combinations of subsequences or subwords of a given length k. The rationale is that the parameter k can be used to influence the size and simplicity of the separator. The emphasis of our study is on tracing the tractability of the problem.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.230/LIPIcs.ICDT.2015.230.pdf</fullTextUrl>

<keyword>separability</keyword>

<keyword>complexity</keyword>

<keyword>graph data</keyword>

<keyword>debugging</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.247</doi>

<documentType>article</documentType>

<title language="eng">Process-Centric Views of Data-Driven Business Artifacts</title>

<name>Koutsos, Adrien</name>

</author>

<name>Vianu, Victor</name>

</author>

</authors>

<abstract language="eng">Declarative, data-aware workflow models are becoming increasingly pervasive. While these have numerous benefits, classical process-centric specifications retain certain advantages. Workflow designers are used to development tools such as BPMN or UML diagrams, that focus on control flow. Views describing valid sequences of tasks are also useful to provide stake-holders with high-level descriptions of the workflow, stripped of the accompanying data. In this paper we study the problem of recovering process-centric views from declarative, data-aware workflow specifications in a variant of IBM's business artifact model. We focus on the simplest and most natural process-centric views, specified by finite-state transition systems, and describing regular languages. The results characterize when process-centric views of artifact systems are regular, using both linear and branching-time semantics. We also study the impact of data dependencies on regularity of the views.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.247/LIPIcs.ICDT.2015.247.pdf</fullTextUrl>

<keyword>Workflows</keyword>

<keyword>data-aware</keyword>

<keyword>process-centric</keyword>

<keyword>views</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.265</doi>

<documentType>article</documentType>

<title language="eng">On The I/O Complexity of Dynamic Distinct Counting</title>

<name>Hu, Xiaocheng</name>

</author>

<name>Tao, Yufei</name>

</author>

</author>

<name>Zhang, Shengyu</name>

</author>

<name>Zhou, Shuigeng</name>

</author>

</authors>

<abstract language="eng">In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size. In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.265/LIPIcs.ICDT.2015.265.pdf</fullTextUrl>

<keyword>distinct counting</keyword>

<keyword>lower bound</keyword>

<keyword>external memory</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.277</doi>

<documentType>article</documentType>

<title language="eng">Shared-Constraint Range Reporting</title>

<name>Biswas, Sudip</name>

</author>

<name>Patil, Manish</name>

</author>

<name>Shah, Rahul</name>

</author>

<name>Thankachan, Sharma V.</name>

</author>

</authors>

<abstract language="eng">Orthogonal range reporting is one of the classic and most fundamental data structure problems. (2,1,1) query is a 3 dimensional query with two-sided constraint on the first dimension and one sided constraint on each of the 2nd and 3rd dimension. Given a set of N points in three dimension, a particular formulation of such a (2,1,1) query (known as four-sided range reporting in three-dimension) asks to report all those K points within a query region [a, b]X(-infinity, c]X[d, infinity). These queries have overall 4 constraints. In Word-RAM model, the best known structure capable of answering such queries with optimal query time takes O(N log^{epsilon} N) space, where epsilon>0 is any positive constant. It has been shown that any external memory structure in optimal I/Os must use Omega(N log N/ log log_B N) space (in words), where B is the block size [Arge et al., PODS 1999]. In this paper, we study a special type of (2,1,1) queries, where the query parameters a and c are the same i.e., a=c. Even though the query is still four-sided, the number of independent constraints is only three. In other words, one constraint is shared. We call this as a Shared-Constraint Range Reporting (SCRR) problem. We study this problem in both internal as well as external memory models. In RAM model where coordinates can only be compared, we achieve linear-space and O(log N+K) query time solution, matching the best-known three dimensional dominance query bound. Whereas in external memory, we present a linear space structure with O(log_B N + log log N + K/B) query I/Os. We also present an I/O-optimal (i.e., O(log_B N+K/B) I/Os) data structure which occupies O(N log log N)-word space. We achieve these results by employing a novel divide and conquer approach. SCRR finds application in database queries containing sharing among the constraints. We also show that SCRR queries naturally arise in many well known problems such as top-k color reporting, range skyline reporting and ranked document retrieval.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.277/LIPIcs.ICDT.2015.277.pdf</fullTextUrl>

<keyword>data structure</keyword>

<keyword>shared constraint</keyword>

<keyword>multi-slab</keyword>

<keyword>point partitioning</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.291</doi>

<documentType>article</documentType>

<title language="eng">Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data</title>

<name>Ketsman, Bas</name>

</author>

<name>Neven, Frank</name>

</author>

</authors>

<abstract language="eng">In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.291/LIPIcs.ICDT.2015.291.pdf</fullTextUrl>

<keyword>Coordination-free evaluation</keyword>

<keyword>conjunctive queries</keyword>

<keyword>broadcasting</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.308</doi>

<documentType>article</documentType>

<title language="eng">Datalog Queries Distributing over Components</title>

<name>Ameloot, Tom J.</name>

</author>

<name>Ketsman, Bas</name>

</author>

<name>Neven, Frank</name>

</author>

<name>Zinn, Daniel</name>

</author>

</authors>

<abstract language="eng">We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog with Negation (the fragment of Datalog with Negation where all rules are connected) provides an effective syntax for Datalog with Negation programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work [Zinn, Green, and Ludäscher, ICDT2012], namely, that the classic win-move query is in F_2 (a particular class of coordination-free queries).</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.308/LIPIcs.ICDT.2015.308.pdf</fullTextUrl>

<keyword>Datalog</keyword>

<keyword>stratified semantics</keyword>

<keyword>well-founded semantics</keyword>

<keyword>coordination-free evaluation</keyword>

<keyword>distributed databases</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.324</doi>

<documentType>article</documentType>

<title language="eng">Distributed Streaming with Finite Memory</title>

<name>Neven, Frank</name>

</author>

<name>Schweikardt, Nicole</name>

</author>

<name>Servais, Frédéric</name>

</author>

</author>

</authors>

<abstract language="eng">We introduce three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the key-value paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets. We obtain the following results: (1) DSAs can evaluate first-order queries over bounded degree databases; (2) DSTs can evaluate semijoin algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.324/LIPIcs.ICDT.2015.324.pdf</fullTextUrl>

<keyword>distributed systems</keyword>

<keyword>relational algebra</keyword>

<keyword>semijoin algebra</keyword>

<keyword>register automata</keyword>

<keyword>register transducers.</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.342</doi>

<documentType>article</documentType>

<title language="eng">From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back</title>

<name>Salimi, Babak</name>

</author>

<name>Bertossi, Leopoldo</name>

</author>

</authors>

<abstract language="eng">In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problem of determining actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition-inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.342/LIPIcs.ICDT.2015.342.pdf</fullTextUrl>

<keyword>causality,diagnosis,repairs,consistent query answering,integrity constraints</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.363</doi>

<documentType>article</documentType>

<title language="eng">On the Relationship between Consistent Query Answering and Constraint Satisfaction Problems</title>

<name>Lutz, Carsten</name>

</author>

<name>Wolter, Frank</name>

</author>

</authors>

<abstract language="eng">Recently, Fontaine has pointed out a connection between consistent query answering (CQA) and constraint satisfaction problems (CSP) [Fontaine, LICS 2013]. We investigate this connection more closely, identifying classes of CQA problems based on denial constraints and GAV constraints that correspond exactly to CSPs in the sense that a complexity classification of the CQA problems in each class is equivalent (up to FO-reductions) to classifying the complexity of all CSPs. We obtain these classes by admitting only monadic relations and only a single variable in denial constraints/GAVs and restricting queries to hypertree UCQs. We also observe that dropping the requirement of UCQs to be hypertrees corresponds to transitioning from CSP to its logical generalization MMSNP and identify a further relaxation that corresponds to transitioning from MMSNP to GMSNP (also know as MMSNP_2). Moreover, we use the CSP connection to carry over decidability of FO-rewritability and Datalog-rewritability to some of the identified classes of CQA problems.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.363/LIPIcs.ICDT.2015.363.pdf</fullTextUrl>

<keyword>Consistent Query Answering</keyword>

<keyword>Constraint Satisfaction</keyword>

<keyword>Data Complexity</keyword>

<keyword>Dichotomies</keyword>

<keyword>Rewritability</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Leibniz International Proceedings in Informatics</journalTitle>

<doi>10.4230/LIPIcs.ICDT.2015.380</doi>

<documentType>article</documentType>

<title language="eng">On the Data Complexity of Consistent Query Answering over Graph Databases</title>

<name>Barceló, Pablo</name>

</author>

<name>Fontaine, Gaëlle</name>

</author>

</authors>

<abstract language="eng">Areas in which graph databases are applied - such as the semantic web, social networks and scientific databases - are prone to inconsistency, mainly due to interoperability issues. This raises the need for understanding query answering over inconsistent graph databases in a framework that is simple yet general enough to accommodate many of its applications. We follow the well-known approach of consistent query answering (CQA), and study the data complexity of CQA over graph databases for regular path queries (RPQs) and regular path constraints (RPCs), which are frequently used. We concentrate on subset, superset and symmetric difference repairs. Without further restrictions, CQA is undecidable for the semantics based on superset and symmetric difference repairs, and Pi_2^P-complete for subset repairs. However, we provide several tractable restrictions on both RPCs and the structure of graph databases that lead to decidability, and even tractability of CQA. We also compare our results with those obtained for CQA in the context of relational databases.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/00lipics/lipics-vol031-icdt2015/LIPIcs.ICDT.2015.380/LIPIcs.ICDT.2015.380.pdf</fullTextUrl>

<keyword>graph databases</keyword>

<keyword>regular path queries</keyword>

<keyword>consistent query answering</keyword>

<keyword>description logics</keyword>

<keyword>rewrite systems</keyword>

</keywords>

</record>

</records>