18th International Conference on Database Theory (ICDT 2015), ICDT 2015, March 23-27, 2015, Brussels, Belgium
ICDT 2015
March 23-27, 2015
Brussels, Belgium
International Conference on Database Theory
ICDT
https://databasetheory.org/icdt-pages
https://dblp.org/db/conf/icdt
Leibniz International Proceedings in Informatics
LIPIcs
https://www.dagstuhl.de/dagpub/1868-8969
https://dblp.org/db/series/lipics
1868-8969
Marcelo
Arenas
Marcelo Arenas
Martín
Ugarte
Martín Ugarte
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
31
2015
978-3-939897-79-8
https://www.dagstuhl.de/dagpub/978-3-939897-79-8
Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors
Title, Table of Contents, Preface, ICDT 2015 Test of Time Award, Organization, External Reviewers, List of Authors
Title
Table of Contents
Preface
ICDT 2015 Test of Time Award
Organization
External Reviewers
List of Authors
i-xvi
Front Matter
Marcelo
Arenas
Marcelo Arenas
Martín
Ugarte
Martín Ugarte
10.4230/LIPIcs.ICDT.2015.i
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
The Confounding Problem of Private Data Release (Invited Talk)
The demands to make data available are growing ever louder, including open data initiatives and "data monetization". But the problem of doing so without disclosing confidential information is a subtle and difficult one. Is "private data release" an oxymoron? This paper (accompanying an invited talk) aims to delve into the motivations of data release, explore the challenges, and outline some of the current statistical approaches developed in response to this confounding problem.
privacy
anonymization
data release
1-12
Invited Talk
Graham
Cormode
Graham Cormode
10.4230/LIPIcs.ICDT.2015.1
Graham Cormode. Personal privacy vs population privacy: Learning to attack anonymization. In ACM SIGKDD, August 2011.
Graham Cormode, Magda Procopiuc, Divesh Srivastava, and Thanh Tran. Differentially private publication of sparse data. In International Conference on Database Theory, 2012.
Graham Cormode, Magda Procopiuc, Divesh Srivastava, Xiaokui Xiao, and Jun Zhang. Privbayes: Private data release via bayesian networks. In ACM SIGMOD International Conference on Management of Data (SIGMOD), 2014.
Cynthia Dwork. A firm foundation for private data analysis. Communications of the ACM, 54(1):86-95, 2011.
Dan Kifer. Attacks on privacy and deFinetti’s theorem. In ACM SIGMOD International Conference on Management of Data, 2009.
Ashwin Machanavajjhala, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. Privacy: Theory meets practice on the map. In IEEE International Conference on Data Engineering, 2008.
Paul Ohm. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57(6):1701-1778, August 2010.
Pierangela Samarati and Latanya Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI, 1998.
L. Sweeney. Simple demographics often identify people uniquely. Technical Report Data Privacy Working Paper 3, Carnegie Mellon University, 2000.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Using Locality for Efficient Query Evaluation in Various Computation Models (Invited Talk)
In the database theory and logic literature, different notions of locality of queries have been studied, the most prominent being Hanf locality and Gaifman locality. These notions are designed so that, in order to evaluate a local query in a given database, it suffices to look only at small neighbourhoods around tuples of elements that belong to the database.
In this talk I want to give a survey of how to use locality for efficient query evaluation in various computation models. In particular, we will take a closer look at how to enumerate query results with constant delay, and at how to evaluate queries in a map-reduce like setting [Neven et al., ICDT 2015] or in Pregel [Malewicz et al., SIGMOD 2010]. Also, we will have a closer look at how to transform a given local query into a form suitable for exploiting its locality.
query evaluation
locality
13-14
Invited Talk
Nicole
Schweikardt
Nicole Schweikardt
10.4230/LIPIcs.ICDT.2015.13
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Large-Scale Similarity Joins With Guarantees (Invited Talk)
The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access.
Similarity join
filtering
locality-sensitive hashing
recall
15-24
Invited Talk
Rasmus
Pagh
Rasmus Pagh
10.4230/LIPIcs.ICDT.2015.15
Panagiotis Achlioptas, Bernhard Schölkopf, and Karsten Borgwardt. Two-locus association mapping in subquadratic time. In Proceedings of KDD, pages 726-734. ACM, 2011.
Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. arXiv preprint arXiv:1501.01062, 2015.
Arvind Arasu, Venkatesh Ganti, and Raghav Kaushik. Efficient exact set-similarity joins. In Proceedings of VLDB, pages 918-929, 2006.
Nikolaus Augsten and Michael H Böhlen. Similarity joins in relational database systems. Synthesis Lectures on Data Management, 5(5):1-124, 2013.
Bahman Bahmani, Ashish Goel, and Rajendra Shinde. Efficient distributed locality sensitive hashing. In Proceedings of CIKM, pages 2174-2178, 2012.
Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. Scaling up all pairs similarity search. In Proceedings of WWW, pages 131-140, 2007.
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of STOC, pages 380-388, 2002.
Surajit Chaudhuri, Venkatesh Ganti, and Raghav Kaushik. A primitive operator for similarity joins in data cleaning. In Proceedings of ICDE, page 5, 2006.
Yun Chen and Jignesh M Patel. Efficient evaluation of all-nearest-neighbor queries. In Proceedings of ICDE, pages 1056-1065. IEEE, 2007.
Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang. Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng., 13(1):64-78, 2001.
Abhinandan Das, Mayur Datar, Ashutosh Garg, and ShyamSundar Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of WWW, pages 271-280, 2007.
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of SOCG, pages 253-262, 2004.
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Proceedings of VLDB, pages 518-529, 1999.
Dan Greene, Michal Parnas, and Frances Yao. Multi-index hashing for information retrieval. In Proceedings of FOCS, pages 722-731. IEEE, 1994.
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1):321-350, 2012.
Theodore E Harris. The theory of branching processes. Courier Dover Publications, 2002.
Edwin H Jacox and Hanan Samet. Metric space similarity joins. ACM Transactions on Database Systems (TODS), 33(2):7, 2008.
Yu Jiang, Dong Deng, Jiannan Wang, Guoliang Li, and Jianhua Feng. Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints. In Proceedings of Joint EDBT/ICDT Workshops, pages 341-348. ACM, 2013.
Guoliang Li, Dong Deng, Jiannan Wang, and Jianhua Feng. Pass-join: A partition-based method for similarity joins. Proceedings of the VLDB Endowment, 5(3):253-264, 2011.
Yucheng Low and Alice X Zheng. Fast top-k similarity queries via matrix compression. In Proceedings of CIKM, pages 2070-2074. ACM, 2012.
Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, and Haiyong Wang. String similarity measures and joins with synonyms. In Proceedings of SIGMOD, pages 373-384. ACM, 2013.
Marvin L Minsky and Seymour A Papert. Perceptrons - Expanded Edition: An Introduction to Computational Geometry. MIT press, 1987.
Mohammad Norouzi, Ali Punjani, and David J Fleet. Fast search in hamming space with multi-index hashing. In Proceedings of CVPR, pages 3108-3115. IEEE, 2012.
Rasmus Pagh, Ninh Pham, Francesco Silvestri, and Morten Stöckel. I/O-efficient similarity join in high dimensions. Manuscript, 2015.
Rasmus Pagh, Morten Stöckel, and David P. Woodruff. Is min-wise hashing optimal for summarizing set intersection? In Proceedings of PODS, pages 109-120. ACM, 2014.
Ramamohan Paturi, Sanguthevar Rajasekaran, and John Reif. The Light Bulb Problem. Information and Computation, 117(2):187-192, March 1995.
Yasin N Silva, Walid G Aref, and Mohamed H Ali. The similarity join database operator. In Proceedings of ICDE, pages 892-903. IEEE, 2010.
Gregory Valiant. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas. In Proceedings of FOCS, pages 11-20. IEEE, October 2012.
Jeffrey Scott Vitter. Algorithms and Data Structures for External Memory. Now Publishers Inc., 2008.
Jiannan Wang, Guoliang Li, and Jianhua Fe. Fast-join: An efficient method for fuzzy token matching based string similarity join. In Proceedings of ICDE, pages 458-469. IEEE, 2011.
Jiannan Wang, Guoliang Li, and Jianhua Feng. Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In Proceedings of SIGMOD, pages 85-96. ACM, 2012.
Ye Wang, Ahmed Metwally, and Srinivasan Parthasarathy. Scalable all-pairs similarity search in metric spaces. In Proceedings of KDD, pages 829-837, 2013.
Chenyi Xia, Hongjun Lu, Beng Chin Ooi, and Jing Hu. Gorder: an efficient method for knn join processing. In Proceedings of VLDB, pages 756-767. VLDB Endowment, 2004.
Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. Efficient similarity joins for near duplicate detection. In Proceedings of WWW, pages 131-140, 2008.
Reza Bosagh Zadeh and Ashish Goel. Dimension independent similarity computation. The Journal of Machine Learning Research, 14(1):1605-1626, 2013.
Xiang Zhang, Fei Zou, and Wei Wang. Fastanova: an efficient algorithm for genome-wide association study. In Proceedings of KDD, pages 821-829. ACM, 2008.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
A Declarative Framework for Linking Entities
The aim of this paper is to introduce and develop a truly declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on the systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons as to why two entities should be linked. We also consider extensions of the core language that capture collective entity resolution, by allowing inter-dependence between links.
We identify a class of "good" solutions for entity linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions, and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language, but may become intractable once we allow inter-dependence between link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.
entity linking
entity resolution
constraints
certain links
25-43
Regular Paper
Douglas
Burdick
Douglas Burdick
Ronald
Fagin
Ronald Fagin
Phokion G.
Kolaitis
Phokion G. Kolaitis
Lucian
Popa
Lucian Popa
Wang-Chiew
Tan
Wang-Chiew Tan
10.4230/LIPIcs.ICDT.2015.25
Bogdan Alexe, Douglas Burdick, Mauricio A. Hernández, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, Ioana R. Stanoi, and Ryan Wisnesky. High-Level Rules for Integration and Analysis of Data: New Challenges. In LNCS 8000: In Search of Elegance in the Theory and Practice of Computation, pages 36-55, 2013.
A. Arasu, C. Re, and D. Suciu. Large-Scale Deduplication with Constraints using Dedupalog. In ICDE, pages 952-963, 2009.
M. Arenas, P. Barceló, R. Fagin, and L. Libkin. Solutions and Query Rewriting in Data Exchange. Inf. Comp., pages 28-51, 2013.
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent Query Answers in Inconsistent Databases. In PODS, pages 68-79, 1999.
Leopoldo E. Bertossi, Solmaz Kolahi, and Laks V. S. Lakshmanan. Data Cleaning and Query Answering with Matching Dependencies and Matching Functions. Theory of Computing Systems, 52(3):441-482, 2013.
Indrajit Bhattacharya and Lise Getoor. Collective Entity Resolution in Relational Data. TKDD, 1(1), 2007.
Laura Chiticariu, Yunyao Li, and Frederick R. Reiss. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! In EMNLP, pages 827-832, 2013.
Jan Chomicki and Jerzy Marcinkowski. Minimal-Change Integrity Maintenance using Tuple Deletions. Inf. Comp., 197:90-121, 2005.
Xin Dong, Alon Y. Halevy, and Jayant Madhavan. Reference Reconciliation in Complex Information Spaces. In SIGMOD, pages 85-96, 2005.
J. Edmonds. Maximum Matching and a Polyhedron with 0,1-vertices. Journal of Research National Bureau of Standards Section B, 69:125-130, 1965.
Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. Duplicate Record Detection: A Survey. IEEE TKDE, 19(1):1-16, 2007.
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. Theoretical Computer Science (TCS), 336(1):89-124, 2005.
Wenfei Fan. Dependencies Revisited for Improving Data Quality. In PODS, pages 159-170, 2008.
Wenfei Fan and Floris Geerts. Foundations of Data Quality Management. Morgan & Claypool Publishers, 2012.
I. P. Fellegi and A. B. Sunter. A Theory for Record Linkage. J. Am. Statistical Assoc., 64(328):1183-1210, 1969.
K. Fukuda and T. Matsui. Finding All the Perfect Matchings in Bipartite Graphs. Appl. Math. Lett., 7(1):15-18, 1994.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In VLDB, pages 371-380, 2001.
Venkatesh Ganti and Anish Das Sarma. Data Cleaning: A Practical Perspective. Morgan & Claypool Publishers, 2013.
Lise Getoor and Ashwin Machanavajjhala. Entity Resolution: Theory, Practice & Open Challenges. PVLDB, 5(12):2018-2019, 2012.
Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renée J. Miller, and Min Wang. A Framework for Semantic Link Discovery over Relational Data. In CIKM, pages 1027-1036, 2009.
Mauricio A. Hernández, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, and Ryan Wisnesky. HIL: A High-Level Scripting Language for Entity Integration. In EDBT, pages 549-560, 2013.
Mauricio A. Hernández and Salvatore J. Stolfo. The Merge/Purge Problem for Large Databases. In SIGMOD, pages 127-138, 1995.
IBM InfoSphere QualityStage. URL: http://www.ibm.com/software/products/en/ibminfoqual.
http://www.ibm.com/software/products/en/ibminfoqual
D.S. Johnson, C.H. Papadimitriou, and M. Yannakakis. On Generating All Maximal Independent Sets. Inf. Process. Lett., 27(3):119-123, 1988.
Peter Jonsson and Andrei A. Krokhin. Recognizing Frozen Variables in Constraint Satisfaction Problems. Theoretical Computer Science (TCS), 329(1-3):93-113, 2004.
Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. Record Linkage: Similarity Measures and Algorithms. In SIGMOD, pages 802-803, 2006.
K.G. Murty. An Algorithm for Ranking All the Assignments in Order of Increasing Cost. Operations Research, 16(3):682-687, 1968.
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
Matthew Richardson and Pedro Domingos. Markov Logic Networks. Machine Learning, 62(1-2):107-136, 2006.
Parag Singla and Pedro Domingos. Entity Resolution with Markov Logic. In ICDM, pages 572-582, 2006.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Asymptotic Determinacy of Path Queries using Union-of-Paths Views
We consider the view determinacy problem over graph databases for queries defined as (possibly infinite) unions of path queries. These queries select pairs of nodes in a graph that are connected through a path whose length falls in a given set. A view specification is a set of such queries. We say that a view specification V determines a query Q if, for all databases D, the answers to V on D contain enough information to answer Q.
Our main result states that, given a view V, there exists an explicit bound that depends on V such that we can decide the determinacy problem for all queries that ask for a path longer than this bound, and provide first-order rewritings for the queries that are determined. We call this notion asymptotic determinacy. As a corollary, we can also compute the set of almost all path queries that are determined by V.
Graph databases
Views
Determinacy
Rewriting
Path queries
44-59
Regular Paper
Nadime
Francis
Nadime Francis
10.4230/LIPIcs.ICDT.2015.44
Serge Abiteboul and Oliver M. Duschka. Complexity of answering queries using materialized views. In ACM Symp. on Principles of Database Systems (PODS), pages 254-263, 1998.
Foto N. Afrati. Determinacy and query rewriting for conjunctive queries and views. Theoretical Computer Science, 412(11):1005-1021, 2011.
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Lossless regular views. In ACM Symp. on Principles of Database Systems (PODS), pages 247-258. ACM, 2002.
Nadime Francis, Luc Segoufin, and Cristina Sirangelo. Datalog rewritings of regular path queries using views. In Proceedings of the 17th International Conference on Database Theory (ICDT'14), pages 107-118, 2014.
Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In ACM Symp. on Principles of Database Systems (PODS), pages 95-104, 1995.
Alan Nash, Luc Segoufin, and Victor Vianu. Views and queries: Determinacy and rewriting. ACM Transactions on Database Systems, 35(3), 2010.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Games for Active XML Revisited
The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The paper examines several extensions to active context-free games.
The primary extension allows more expressive schemas (namely XML schemas and regular nested word languages) for both target and replacement languages and has the effect that games are played on nested words instead of (flat) words as in previous studies. Other extensions consider validation of input parameters of web services, and an alternative semantics based on insertion of service call results.
In general, the complexity of the safe rewriting problem is highly intractable (doubly exponential time), but the paper identifies interesting tractable cases.
Active XML
Computational Complexity
Nested Words
Rewriting Games
Semistructured Data
60-75
Regular Paper
Martin
Schuster
Martin Schuster
Thomas
Schwentick
Thomas Schwentick
10.4230/LIPIcs.ICDT.2015.60
Serge Abiteboul, Omar Benjelloun, and Tova Milo. The Active XML project: an overview. VLDB J., 17(5):1019-1040, 2008.
Serge Abiteboul, Tova Milo, and Omar Benjelloun. Regular rewriting of active XML and unambiguity. In PODS, pages 295-303, 2005.
Rajeev Alur and P. Madhusudan. Adding nesting structure to words. J. ACM, 56(3), 2009.
Henrik Björklund, Martin Schuster, Thomas Schwentick, and Joscha Kulbatzki. On optimum left-to-right strategies for active context-free games. In Joint 2013 EDBT/ICDT Conferences, ICDT '13 Proceedings, Genoa, Italy, March 18-22, 2013, pages 105-116, 2013.
Laura Bozzelli. Alternating automata and a temporal fixpoint calculus for visibly pushdown languages. In CONCUR- Concurrency Theory, 18th International Conference, pages 476-491, 2007.
E. Grädel, W. Thomas, and T. Wilke, editors. Automata, Logics, and Infinite Games. A Guide to Current Research. Springer, 2002.
Lukasz Kaiser. Synthesis for structure rewriting systems. In Rastislav Královic and Damian Niwinski, editors, MFCS, volume 5734 of Lecture Notes in Computer Science, pages 415-426. Springer, 2009.
Wim Martens, Frank Neven, Thomas Schwentick, and Geert Jan Bex. Expressiveness and complexity of XML schema. ACM Trans. Database Syst., 31(3):770-813, 2006.
Tova Milo, Serge Abiteboul, Bernd Amann, Omar Benjelloun, and Frederic Dang Ngoc. Exchanging intensional XML data. ACM Trans. Database Syst., 30(1):1-40, 2005.
Makoto Murata, Dongwon Lee, Murali Mani, and Kohsuke Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn., 5(4):660-704, 2005.
Anca Muscholl, Thomas Schwentick, and Luc Segoufin. Active context-free games. Theory Comput. Syst., 39(1):237-276, 2006.
Martin Schuster and Thomas Schwentick. Games for Active XML revisited. CoRR, abs/1412.5910, 2014. Available online at URL: http://arxiv.org/abs/1412.5910.
http://arxiv.org/abs/1412.5910
Johannes Waldmann. Rewrite games. In Sophie Tison, editor, RTA, volume 2378 of Lecture Notes in Computer Science, pages 144-158. Springer, 2002.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Answering Conjunctive Queries with Inequalities
In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern.
query evaluation
conjunctive query
inequality
treewidth
76-93
Regular Paper
Paraschos
Koutris
Paraschos Koutris
Tova
Milo
Tova Milo
Sudeepa
Roy
Sudeepa Roy
Dan
Suciu
Dan Suciu
10.4230/LIPIcs.ICDT.2015.76
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
Foto Afrati, Chen Li, and Prasenjit Mitra. Answering queries using views with arithmetic comparisons. In PODS, pages 209-220, 2002.
Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997.
Noga Alon, Raphael Yuster, and Uri Zwick. Color coding. In Ming-Yang Kao, editor, Encyclopedia of Algorithms. Springer, 2008.
Albert Atserias, Martin Grohe, and Daniel Marx. Size bounds and query plans for relational joins. FOCS, pages 739-748, 2008.
Chandra Chekuri and Anand Rajaraman. Conjunctive query containment revisited. Theor. Comput. Sci., 239(2):211-229, 2000.
Marc Demange and Dominique De Werra. On some coloring problems in grids. Theor. Comput. Sci., 472:9-27, February 2013.
Arnaud Durand and Etienne Grandjean. The complexity of acyclic conjunctive queries revisited. CoRR, abs/cs/0605008, 2006.
Jörg Flum, Markus Frick, and Martin Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716-752, November 2002.
Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. In PODS, pages 21-32, 1999.
M.H. Graham. On the universal relation. Technical Report, University of Toronto, Ontario, Canada, 1979.
Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. In SODA, pages 289-298, 2006.
Klaus Jansen and Petra Scheffler. Generalized coloring for tree-like graphs. Discrete Applied Mathematics, 75(2):135-155, 1997.
Phokion G. Kolaitis, David L. Martin, and Madhukar N. Thakur. On the complexity of the containment problem for conjunctive queries with built-in predicates. In PODS, pages 197-204, 1998.
Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu. Answering conjunctive queries with inequalities. CoRR, abs/1412.3869, 2014.
B. Monien. How to find long paths efficiently. In G. Ausiello and M. Lucertini, editors, Analysis and Design of Algorithms for Combinatorial Problems, volume 109 of North-Holland Mathematics Studies, pages 239 - 254. North-Holland, 1985.
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms: [extended abstract]. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, pages 37-48, 2012.
Christos H. Papadimitriou and Mihalis Yannakakis. On the complexity of database queries. In PODS, pages 12-19, 1997.
Neil Robertson and P.D Seymour. Graph minors. iii. planar tree-width. Journal of Combinatorial Theory, Series B, 36(1):49 - 64, 1984.
Riccardo Rosati. The limits of querying ontologies. In ICDT, pages 164-178, 2007.
Ron van der Meyden. The complexity of querying indefinite data about linearly ordered domains. J. Comput. Syst. Sci., 54(1):113-135, February 1997.
Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014., pages 96-106, 2014.
Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82-94. IEEE Computer Society, 1981.
C.T. Yu and M. Z. Ozsoyoglu. An algorithm for tree-query membership of a distributed query. In COMPSAC, pages 306-312, 1979.
Raphael Yuster and Uri Zwick. Finding even cycles even faster. SIAM J. Discrete Math., 10(2):209-222, 1997.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
SQL's Three-Valued Logic and Certain Answers
SQL uses three-valued logic for evaluating queries on databases with nulls. The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. While these two cannot coincide, due to a significant complexity mismatch, we can still ask whether the two schemes are related in any way. For instance, does SQL always produce answers we can be certain about?
This is not so: SQL's and certain answers semantics could be totally unrelated. We show, however, that a slight modification of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or non-certain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. We also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values.
Null values
incomplete information
query evaluation
three-valued logic
certain answers
94-109
Regular Paper
Leonid
Libkin
Leonid Libkin
10.4230/LIPIcs.ICDT.2015.94
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
A Trichotomy in the Complexity of Counting Answers to Conjunctive Queries
Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We present a trichotomy theorem, which shows essentially that this problem on a set of conjunctive queries is either tractable, equivalent to the parameterized CLIQUE problem, or as hard as the parameterized counting CLIQUE problem; the criteria describing which of these situations occurs is simply stated, in terms of graph-theoretic conditions.
database theory
query answering
conjunctive queries
counting complexity
110-126
Regular Paper
Hubie
Chen
Hubie Chen
Stefan
Mengel
Stefan Mengel
10.4230/LIPIcs.ICDT.2015.110
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC 1977, pages 77-90. ACM, 1977.
Hubie Chen. The tractability frontier of graph-like first-order query sets. CoRR, abs/1407.3429v1, 2014. Conference version appeared in the proceedings of LICS '14.
Hubie Chen and Martin Grohe. Constraint satisfaction with succinctly specified relations. J. Comput. Syst. Sci., 76(8):847-860, 2010.
Hubie Chen and Moritz Müller. One hierarchy spawns another: Graph deconstructions and the complexity classification of conjunctive queries. In LICS, 2014.
V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. Theor. Comput. Sci., 329(1-3):315-323, 2004.
V. Dalmau, P.G. Kolaitis, and M.Y. Vardi. Constraint Satisfaction, Bounded Treewidth, and Finite-Variable Logics. In International Conference on Principles and Practice of Constraint Programming 2002, pages 310-326, 2002.
A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. Theory of Computing Systems, pages 1-48, 2014. accepted, to appear, final version available at http://dx.doi.org/10.1007/s00224-014-9543-y.
J. Flum and M. Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series, 2006.
Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci., 64(3):579-627, 2002.
Gianluigi Greco and Francesco Scarcello. Counting solutions to conjunctive queries: structural and hybrid tractability. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, pages 132-143, 2014.
M. Grohe. The complexity of homomorphism and constraint satisfaction problems seen from the other side. J. ACM, 54(1), 2007.
P. Kolaitis and M. Vardi. Conjunctive-Query Containment and Constraint Satisfaction. Journal of Computer and System Sciences, 61:302-332, 2000.
Dániel Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM, 60(6):42, 2013.
S. Mengel. Conjunctive Queries, Arithmetic Circuits and Counting Complexity. PhD thesis, University of Paderborn, 2013.
C. Papadimitriou and M. Yannakakis. On the Complexity of Database Queries. Journal of Computer and System Sciences, 58(3):407-427, 1999.
R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal of Computer and System Sciences, 2013.
Nicole Schweikardt, Thomas Schwentick, and Luc Segoufin. Database theory: Query languages. In Mikhail J. Atallah and Marina Blanton, editors, Algorithms and Theory of Computation Handbook, volume 2: Special Topics and Techniques, chapter 19. CRC Press, second edition, Nov 2009.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Learning Tree Patterns from Example Graphs
This paper investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and, furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively.
This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial (This is non-trivial as satisfying patterns may be exponential in size). Finally, the minimal learning problem, i.e., that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.
tree patterns
learning
examples
127-143
Regular Paper
Sara
Cohen
Sara Cohen
Yaacov Y.
Weiss
Yaacov Y. Weiss
10.4230/LIPIcs.ICDT.2015.127
Thomas Amoth, Paul Cull, and Prasad Tadepalli. On exact learning of unordered tree patterns. Machine Learning, 44:211-243, 2001.
Dana Angluin. Negative results for equivalence queries. Machine Learning, 5(2):121-150, July 1990.
Timos Antonopoulos, Frank Neven, and Frédéric Servais. Definability problems for graph query languages. In Proceedings of the 16th International Conference on Database Theory, pages 141-152, New York, NY, USA, 2013. ACM.
Hiroki Arimura, Hiroki Ishizaka, and Takeshi Shinohara. Learning unions of tree patterns using queries. Theor. Comput. Sci., 185(1):47-62, 1997.
Julien Carme, Michal Ceresna, and Max Goebel. Query-based learning of XPath expressions. In ICGI, 2006.
Adriane Chapman and H. V. Jagadish. Why not? In SIGMOD. ACM, 2009.
Sara Cohen and Yaacov Y. Weiss. Certain and possible XPath answers. In ICDT, 2013.
Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina, and Jennifer Widom. Synthesizing view definitions from data. In ICDT, 2010.
S. E. Dreyfus and R. A. Wagner. The steiner problem in graphs. Networks, 1(3):195-207, 1971.
Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.
Melanie Herschel, Mauricio A. Hernández, and Wang-Chiew Tan. Artemis: a system for analyzing missing answers. Proc. VLDB Endow., 2:1550-1553, August 2009.
Vagelis Hristidis, Yannis Papakonstantinou, and Andrey Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.
Jiansheng Huang, Ting Chen, AnHai Doan, and Jeffrey F. Naughton. On the provenance of non-answers to queries over extracted data. PVLDB, 1(1):736-747, 2008.
Chuntao Jiang, Frans Coenen, and Michele Zito. A survey of frequent subgraph mining algorithms. Knowledge Eng. Review, 28(1):75-105, 2013.
Benny Kimelfeld and Phokion G. Kolaitis. The complexity of mining maximal frequent subgraphs. In PODS, 2013.
Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006.
Raymond Kosala, Maurice Bruynooghe, Jan Van Den Bussche, and Hendrik Blocked. Information extraction from web documents based on local unranked tree automaton inference. In IJCAI, 2003.
D. Kozen. Lower bounds for natural proof systems. In FOCS, 1977.
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. WHY SO? or WHY NO? Functional Causality for Explaining Query Answers. In Management of Uncertain Data, 2010.
Neeldhara Misra, Geevarghese Philip, Venkatesh Raman, Saket Saurabh, and Somnath Sikdar. FPT algorithms for connected feedback vertex set. J. Comb. Optim., 24(2):131-146, 2012.
Rika Okada, Satoshi Matsumoto, Tomoyuki Uchida, Yusuke Suzuki, and Takayoshi Shoudai. Exact learning of finite unions of graph patterns from queries. In Algorithmic Learning Theory, LNCS, pages 298-312. Springer Berlin Heidelberg, 2007.
Stefan Raeymaekers, Maurice Bruynooghe, and Jan Bussche. Learning (k,l)-contextual tree languages for information extraction from web pages. Machine Learning, 71(2-3):155-183, June 2008.
Slawek Staworko and Piotr Wieczorek. Learning twig and path queries. In ICDT, 2012.
L. J. Stockmeyer and A. R. Meyer. Word problems requiring exponential time. In STOC, 1973.
Quoc Trung Tran and Chee-Yong Chan. How to conquer why-not questions. In SIGMOD, 2010.
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. Query by output. In SIGMOD. ACM, 2009.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Characterizing XML Twig Queries with Examples
Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.
We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable.
Query characterization
Query examples
Query fitting
Twig queries
144-160
Regular Paper
Slawek
Staworko
Slawek Staworko
Piotr
Wieczorek
Piotr Wieczorek
10.4230/LIPIcs.ICDT.2015.144
A. Abouzied, D. Angluin, Ch. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In Proceedings of the 32Nd Symposium on Principles of Database Systems, PODS '13, pages 49-60. ACM, 2013.
S. Amer-Yahia, S. Cho, L. V. S. Lakshmanan, and D. Srivastava. Tree pattern query minimization. VLDB Journal, 11(4):315-331, 2002.
M. Anthony, G. Brightwell, D. Cohen, and J. Shawe-Taylor. On exact specification by examples. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT '92, pages 311-318, New York, NY, USA, 1992. ACM.
S. Cho, S. Amer-Yahia, L. V. S. Lakshmanan, and D. Srivastava. Optimizing the secure evaluation of twig queries. In International Conference on Very Large Data Bases (VLDB), pages 490-501, 2002.
S. Cohen and Y. Y. Weiss. Certain and possible XPath answers. In International Conference on Database Theory (ICDT), 2013.
C. de la Higuera. Characteristic sets for polynomial grammatical inference. Machine Learning, 27(2):125-138, 1997.
E. M. Gold. Language identification in the limit. Information and Control, 10(5):447-474, 1967.
E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302 - 320, 1978.
S. A. Goldman and M. J. Kearns. On the complexity of teaching. Journal of Computer and System Sciences, 50(1):20 - 31, 1995.
S. A. Goldman, R. L. Rivest, and R. E. Schapire. Learning binary relations and total orders. SIAM J. Comput., 22(5):1006-1034, 1993.
B. Kimelfeld and Y. Sagiv. Revisiting redundancy and minimization in an xpath fragment. In EDBT 2008, 11th International Conference on Extending Database Technology, pages 61-72, 2008.
J. Michaliszyn, A. Muscholl, S. Staworko, P. Wieczorek, and Z. Wu. On injective embeddings of tree patterns. CoRR, abs/1204.4948, 2012.
G. Miklau and D. Suciu. Containment and equivalence for a fragment of XPath. Journal of the ACM, 51(1):2-45, 2004.
F. Neven. Automata, logic, and XML. In Workshop on Computer Science Logic (CSL), volume 2471 of Lecture Notes in Computer Science, pages 2-26. Springer, 2002.
F. Neven and T. Schwentick. XPath containment in the presence of disjunction, DTDs, and variables. In International Conference on Database Theory (ICDT), pages 315-329. Springer-Verlag, 2003.
S. Salzberg, A. L. Delcher, D. G. Heath, and S. Kasif. Learning with a helpful teacher. In Proceedings of the 12th International Joint Conference on Artificial Intelligence., pages 705-711, 1991.
T. Schwentick. XPath query containment. SIGMOD Record, 33(1):101-109, 2004.
A. Shinohara and S. Miyano. Teachability in computational learning. New Generation Comput., 8(4):337-347, 1991.
S. Staworko and P. Wieczorek. Learning twig and path queries. In International Conference on Database Theory (ICDT), March 2012.
B. Ten Cate, V. Dalmau, and P. Kolaitis. Learning schema mappings. In International Conference on Database Theory (ICDT), March 2012.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
The Product Homomorphism Problem and Applications
The product homomorphism problem (PHP) takes as input a finite collection of structures A_1, ..., A_n and a structure B, and asks if there is a homomorphism from the direct product between A_1, A_2, ..., and A_n, to B. We pinpoint the computational complexity of this problem. Our motivation stems from the fact that PHP naturally arises in different areas of database theory. In particular, it is equivalent to the problem of determining whether a relation is definable by a conjunctive query, and the existence of a schema mapping that fits a given collection of positive and negative data examples. We apply our results to obtain complexity bounds for these problems.
Homomorphisms
Direct Product
Data Examples
Definability
Conjunctive Queries
Schema Mappings
161-176
Regular Paper
Balder
ten Cate
Balder ten Cate
Victor
Dalmau
Victor Dalmau
10.4230/LIPIcs.ICDT.2015.161
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan. Designing and refining schema mappings via data examples. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD '11, pages 133-144, New York, NY, USA, 2011. ACM.
Timos Antonopoulos, Frank Neven, and Frédéric Servais. Definability problems for graph query languages. In Proceedings of the 16th International Conference on Database Theory, ICDT '13, pages 141-152, New York, NY, USA, 2013. ACM.
F. Banchilon. On the completeness of query languages for relational databases. In Proceedings of MFCS, pages 112-123, 1978.
Egon Börger, Erich Grädel, and Yuri Gurevich. The Classical Decision Problem. Perspectives in Mathematical Logic. Springer, 1997.
Nadia Creignou, Phokion G. Kolaitis, and Bruno Zanuttini. Structure identification of boolean relations and plain bases for co-clones. J. Comput. Syst. Sci., 74(7):1103-1115, 2008.
Victor Dalmau. Computational Complexity of Problems over Generalized Formulas. PhD thesis, Universitat Politècnica de Catalunya, 2000.
Rina Dechter and Judea Pearl. Structure identification in relational data. Artif. Intell., 58(1-3):237-270, 1992.
Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89 - 124, 2005. Database Theory.
G.H.L. Fletcher, M. Gyssens, J. Paredaens, and D. Van Gucht. On the expressive power of the relational algebra on finite sets of relation pairs. Knowledge and Data Engineering, IEEE Transactions on, 21(6):939 -942, june 2009.
Peter Jeavons, David A. Cohen, and Marc Gyssens. How to determine the expressive power of constraints. Constraints, 4(2):113-131, 1999.
Phokion G. Kolaitis. Schema mappings, data exchange, and metadata management. In Proceedings of the Twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '05, pages 61-75, New York, NY, USA, 2005. ACM.
J. Paredaens. On the expressive power of the relational algebra. Information Processing Letters, 7(2):107 - 111, 1978.
Ross Willard. Testing expressibility is hard. In David Cohen, editor, CP, volume 6308 of Lecture Notes in Computer Science, pages 9-23. Springer, 2010.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Regular Queries on Graph Databases
Graph databases are currently one of the most popular paradigms for storing data. One of the key conceptual differences between graph and relational databases is the focus on navigational queries that ask whether some nodes are connected by paths satisfying certain restrictions. This focus has driven the definition of several different query languages and the subsequent study of their fundamental properties.
We define the graph query language of Regular Queries, which is a natural extension of unions of conjunctive 2-way regular path queries (UC2RPQs) and unions of conjunctive nested 2-way regular path queries (UCN2RPQs). Regular queries allow expressing complex regular patterns between nodes. We formalize regular queries as nonrecursive Datalog programs with transitive closure rules. This language has been previously considered, but its algorithmic properties are not well understood.
Our main contribution is to show elementary tight bounds for the containment problem for regular queries. Specifically, we show that this problem is 2EXPSPACE-complete. For all extensions of regular queries known to date, the containment problem turns out to be non-elementary. Together with the fact that evaluating regular queries is not harder than evaluating UCN2RPQs, our results show that regular queries achieve a good balance between expressiveness and complexity, and constitute a well-behaved class that deserves further investigation.
graph databases
conjunctive regular path queries
regular queries
containment.
177-194
Regular Paper
Juan L.
Reutter
Juan L. Reutter
Miguel
Romero
Miguel Romero
Moshe Y.
Vardi
Moshe Y. Vardi
10.4230/LIPIcs.ICDT.2015.177
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Complexity and Expressiveness of ShEx for RDF
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. A ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable.
RDF
Schema
Graph topology
Validation
Complexity
Expressiveness
195-211
Regular Paper
Slawek
Staworko
Slawek Staworko
Iovka
Boneva
Iovka Boneva
Jose E.
Labra Gayo
Jose E. Labra Gayo
Samuel
Hym
Samuel Hym
Eric G.
Prud'hommeaux
Eric G. Prud'hommeaux
Harold
Solbrig
Harold Solbrig
10.4230/LIPIcs.ICDT.2015.195
M. Arenas, C. Gutierrez, and J. Pérez. Foundations of RDF databases. In Reasoning Web, Int'l Summer School on Semantic Technologies for Information Systems, pages 158-204, 2009. Invited Tutorial.
M. Arenas, J. Pérez, Reutter J., C. Riveros, and J. Sequeda. Data exchange in the relational and RDF worlds. Invited talk at the Int'l Workshop on Semantic Web Information Management (SWIM), June 2011.
G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Transactions on Database Systems, 35(2), 2010.
G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In Int'l Conf. on Very Large Data Bases (VLDB), pages 998-1009, 2007.
J. Bolleman, S. Gehant, and N. Redaschi. Catching inconsistencies with the semantic web: A biocuration case study. In Int'l Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS), 2012.
I. Boneva, R. Ciucanu, and S. Staworko. Schemas for unordered XML on a DIME. Theory of Computing Systems, 2014. To appear. Available at http://arxiv.org/abs/1311.7307.
I. Boneva, J. Emilio Labra Gayo, S. Hym, E. G. Prud'hommeau, H. Solbrig, and S. Staworko. Validating RDF with shape expressions, April 2014. Available at http://arxiv.org/abs/1404.1270.
D. Brickley and R. V. Guha. RDF Schema 1.1. http://www.w3.org/TR/rdf-schema, February 2004.
http://www.w3.org/TR/rdf-schema
R. Ciucanu and S. Staworko. Learning schemas for unordered XML. In Int'l Symp. on Database Programming Languages (DBPL), 2013.
D. Colazzo, G. Ghelli, L. Pardini, and C. Sartiani. Linear inclusion for XML regular expression types. In Int'l Conf. on Information and Knowledge Management (CIKM), pages 137-146, 2009.
D. Colazzo, G. Ghelli, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. Information Systems, 34(7):643-656, 2009.
B. Courcelle. The monadic second-order logic of graphs. I. Recognizable sets of finite graphs. Information and Computation, 85(1):12-75, 1990.
M. Dean and M. Schreiber. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref, February 2004.
http://www.w3.org/TR/owl-ref
H.-D. Ebbinghaus and J. Flum. Finite model theory. Springer, 1995.
J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, and A. Arias. Binary RDF representation for publication and exchange (HDT). J. Web Semantics, 19:22-41, 2013.
G. Ghelli, D. Colazzo, and C. Sartiani. Linear time membership in a class of regular expressions with interleaving and counting. In Int'l Conf. on Information and Knowledge Management (CIKM), pages 389-398, 2008.
S. Ginsburg and Spanier E. H. Semigroups, presburger formulas, and languages. Pacific Journal of Mathematics, 16(2):285-296, December 1966.
B. Glimm and O. Chimezie. SPARQL 1.1 Entailment Regimes. http://www.w3.org/TR/sparql11-entailment/, 2012.
http://www.w3.org/TR/sparql11-entailment/
A. V. Goldberg, E. Tardos, and R. E. Tarjan. Network flow algorithms. In Algorithms and Complexity, Volume 9, Paths, Flows, and VLSI-Layout, 1990.
B. Groz, S. Maneth, and S. Staworko. Deterministic regular expressions in linear time. In ACM Symp. on Principles of Database Systems (PODS), May 2012.
J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 2nd edition, 2001.
E. Kopczynski and A. To. Parikh images of grammars: Complexity and applications. In LICS, pages 80-89, 2010.
D. Kozen. Lower bounds for natural proof systems. In IEEE Symp. on Foundations of Computer Science (FOCS), pages 254-266, 1977.
J. E. Labra Gayo, E. Prud'hommeaux, H. Solbrig, and J. M. Álvarez Rodríguez. Validating and describing linked data portals using RDF Shape Expressions. In Workshop on Linked Data Quality, September 2015.
M. Montazerian, P. T. Wood, and S. R. Mousavi. XPath query satisfiability is in PTIME for real-world DTDs. In Int'l XML Database Symp. (Xsym), pages 17-30, 2007.
M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn., 5(4):660-704, 2005.
D. C. O. Oppen. A 2^2^2^pn upper bound on the complexity of presburger arithmetic. Journal of Computer and System Sciences, 16(3):323-332, 1978.
C. H. Papadimitriou. On the complexity of integer programming. Journal of the ACM, 28(4):765-768, October 1981.
R. J. Parikh. On context-free languages. Journal of the ACM, 13(4):570-581, 1966.
E. Prud'hommeaux, J. E. Labra Gayo, and H. Solbrig. Shape Expressions: An RDF validation and transformation language. In Int'l Conf. on Semantic Systems, Sep. 2015.
J. L. Reutter and T. Tan. A formalism for graph databases and its model of computation. In AMW, volume 749 of CEUR Workshop Proceedings. CEUR-WS.org, 2011.
A. Ryman, A. Le Hors, and S. Speicher. Oslc resource shape: A language for defining constraints on linked data. In Proc. of the WWW2013 Workshop on Linked Data on the Web (LDOW). CEUR-WS.org, 2013.
H. Seidl, T. Schwentick, and A. Muscholl. Counting in trees. Logic and Automata, pages 575-612, 2008.
J. Sequeda, H. Tirmizi, S, Ó. Corcho, and D. P. Miranker. Survey of directly mapping SQL databases to the Semantic Web. Knowledge Engineering Review, 26(4):445-486, 2011.
E. Sirin. Data validation with OWL integrity constraints. In Int'l Conf. on Web Reasoning and Rule Systems (RR), pages 18-22, 2010.
S. Staworko and P. Wieczorek. Learning twig and path queries. In Int'l Conf. on Database Theory (ICDT), March 2012.
J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity constraints in OWL. In Int'l Conf. on Artificial Intelligence (AAAI), 2010.
J. W. Thatcher and Wright J. B. Generalized finite automata with an application to a decision problem of second-order logic. Mathematical System Theory, 2:57-82, 1968.
TPC. TPC benchmarks, URL: http://www.tpc.org/.
http://www.tpc.org/
W3C. RDF validation workshop report: Practical assurances for quality RDF data. ěrb|http://www.w3.org/2012/12/rdf-val/report|, September 2013.
W3C. Shape expressions schemas, 2013. ěrb|http://www.w3.org/2013/ShEx/Primer|.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
CONSTRUCT Queries in SPARQL
SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output.
In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension.
RDF
SPARQL
Query Languages
212-229
Regular Paper
Egor V.
Kostylev
Egor V. Kostylev
Juan L.
Reutter
Juan L. Reutter
Martín
Ugarte
Martín Ugarte
10.4230/LIPIcs.ICDT.2015.212
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995.
Renzo Angles and Claudio Gutierrez. The expressive power of SPARQL. In ISWC, pages 114-129, 2008.
Marcelo Arenas, Pablo Barcelo, Leonid Libkin, and Filip Murlak. Relational and XML data exchange. Synthesis Lectures on Data Management, 2(1):1-112, 2010.
Marcelo Arenas, Sebastián Conca, and Jorge Pérez. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the 21st international conference on World Wide Web, pages 629-638. ACM, 2012.
Marcelo Arenas and Jorge Pérez. Querying semantic web data with SPARQL. In PODS, pages 305-316, 2011.
Pablo Barceló Baeza. Querying graph databases. In Proceedings of the 32nd symposium on Principles of database systems, pages 175-188. ACM, 2013.
Carlos Buil-Aranda, Marcelo Arenas, and Oscar Corcho. Semantics and optimization of the SPARQL 1.1 federation extension. In The Semanic Web: Research and Applications, pages 1-15. Springer, 2011.
Melisachew Wudage Chekol, Jérôme Euzenat, Pierre Genevès, and Nabil Layaïda. SPARQL query containment under SHI axioms. In AAAI, 2012.
Melisachew Wudage Chekol, Jérôme Euzenat, Pierre Genevès, and Nabil Layaïda. SPARQL query containment under RDFS entailment regime. In IJCAR, pages 134-148, 2012.
Rada Chirkova and George HL Fletcher. Towards well-behaved schema evolution. In WebDB, 2009.
Mariano P. Consens and Alberto O. Mendelzon. GraphLog: a visual formalism for real life recursion. In Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 404-416. ACM, 1990.
Orri Erling and Ivan Mikhailov. RDF support in the virtuoso DBMS. In Networked Knowledge-Networked Media, pages 7-24. Springer, 2009.
Ronald Fagin, Phokion G Kolaitis, Renée J Miller, and Lucian Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89-124, 2005.
Ronald Fagin, Phokion G Kolaitis, Lucian Popa, and Wang-Chiew Tan. Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems (TODS), 30(4):994-1055, 2005.
Floris Geerts, Grigoris Karvounarakis, Vassilis Christophides, and Irini Fundulaki. Algebraic structures for capturing the provenance of SPARQL queries. In ICDT, pages 153-164, 2013.
Birte Glimm and Chimezie Ogbuji. SPARQL 1.1 Entailment Regimes. W3C Recommendation, 2013. Available at URL: http://www.w3.org/TR/sparql11-entailment/.
http://www.w3.org/TR/sparql11-entailment/
Harry Halpin and James Cheney. Dynamic provenance for SPARQL updates. In ISWC, 2014.
Steve Harris, Nick Lamb, and Nigel Shadbolt. 4store: The design and implementation of a clustered rdf store. In 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009), pages 94-109, 2009.
Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. Everything you always wanted to know about blank nodes. Web Semantics: Science, Services and Agents on the World Wide Web, 2014.
Egor V. Kostylev and Bernardo Cuenca Grau. On the semantics of SPARQL queries with optional matching under entailment regimes. In ISWC, 2014.
Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 233-246. ACM, 2002.
Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst., 38(4):25, 2013.
Leonid Libkin, Juan Reutter, and Domagoj Vrgoč. TriAL for RDF: adapting graph query languages for RDF data. In Proceedings of the 32nd symposium on Principles of database systems, pages 201-212. ACM, 2013.
Katja Losemann and Wim Martens. The complexity of evaluating path expressions in SPARQL. In Proceedings of the 31st symposium on Principles of Database Systems, pages 101-112. ACM, 2012.
Frank Manola and Eric Miller. RDF Primer. W3C Recommendation, 10 February 2004. Available at URL: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Paolo Missier, Khalid Belhajjame, and James Cheney. The w3c prov family of specifications for modelling provenance metadata. In Proceedings of the 16th International Conference on Extending Database Technology, pages 773-776. ACM, 2013.
Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst., 34(3), 2009.
Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. nSPARQL: A navigational language for RDF. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):255-270, 2010.
François Picalausa and Stijn Vansummeren. What are real SPARQL queries like? In SWIM, 2011.
Reinhard Pichler and Sebastian Skritek. Containment and equivalence of well-designed SPARQL. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 39-50. ACM, 2014.
Axel Polleres and Johannes Peter Wallner. On the relation between SPARQL1.1 and answer set programming. Journal of Applied Non-Classical Logics, 23(1-2):159-212, 2013.
Eric Prud'hommeaux and Andy Seaborne. SPARQL query language for RDF. W3C Recommendation, 2008. Available at URL: http://www.w3.org/TR/rdf-sparql-query/.
http://www.w3.org/TR/rdf-sparql-query/
Eric Prud'hommeaux, Andy Seaborne, et al. SPARQL query language for RDF, 2006.
Raghu Ramakrishnan, Johannes Gehrke, and Johannes Gehrke. Database management systems, volume 3. McGraw-Hill New York, 2003.
Edward L Robertson. Triadic relations: An algebra for the semantic web. In Semantic Web and Databases, pages 91-108. Springer, 2005.
Michael Schmidt, Michael Meier, and Georg Lausen. Foundations of SPARQL query optimization. In ICDT, pages 4-33, 2010.
Andy Seaborne. ARQ-A SPARQL processor for Jena. Obtained through the Internet: http://jena. sourceforge. net/ARQ/, 2010.
Moshe Y Vardi. The complexity of relational query languages. In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pages 137-146. ACM, 1982.
W3C SPARQL Working Group. SPARQL 1.1 Query language. W3C Recommendation, 21 March 2013. Available at URL: http://www.w3.org/TR/sparql11-query/.
http://www.w3.org/TR/sparql11-query/
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Separability by Short Subsequences and Subwords
The separability problem for regular languages asks, given two regular languages I and E, whether there exists a language S that separates the two, that is, includes I but contains nothing from E. Typically, S comes from a simple, less expressive class of languages than I and E. In general, a simple separator $S$ can be seen as an approximation of I or as an explanation of how I and E are different. In a database context, separators can be used for explaining the result of regular path queries or for finding explanations for the difference between paths in a graph database, that is, how paths from given nodes u_1 to v_1 are different from those from u_2 to v_2. We study the complexity of separability of regular languages by combinations of subsequences or subwords of a given length k. The rationale is that the parameter k can be used to influence the size and simplicity of the separator. The emphasis of our study is on tracing the tractability of the problem.
separability
complexity
graph data
debugging
230-246
Regular Paper
Piotr
Hofman
Piotr Hofman
Wim
Martens
Wim Martens
10.4230/LIPIcs.ICDT.2015.230
E. Botoeva, R. Kontchakov, V. Ryzhikov, F. Wolter, and M. Zakharyaschev. Query inseparability for description logic knowledge bases. In Principles of Knowledge Representation and Reasoning (KR), 2014.
J. Brzozowski and I. Simon. Characterizations of locally testable events. Discrete Mathematics, 4(3):243-271, 1973.
P. Buneman and W. C. Tan. Provenance in databases. In International Conference on Management of Data (SIGMOD), pages 1171-1173, 2007.
W. Craig. Three uses of the herbrand-gentzen theorem in relating model theory and proof theory. The Journal of Symbolic Logic, 22(3), 1957.
W. Czerwinski, W. Martens, and T. Masopust. Efficient separability of regular languages by subsequences and suffixes. In International Conference on Automata, Languages and Programming (ICALP), pages 150-161, 2013.
T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. Abstractions from proofs. In Principles of Programming Languages (POPL), pages 232-244, 2004.
E. Kopczynski and A. Widjaja To. Parikh images of grammars: Complexity and applications. In Logic in Computer Science (LICS), pages 80-89, 2010.
C. Lutz and F. Wolter. Foundations for uniform interpolation and forgetting in expressive description logics. In International Joint Conference on Artificial Intelligence (IJCAI), pages 989-995, 2011.
T. Masopust and M. Thomazo. On k-piecewise testability (preliminary report). CoRR, abs/1412.1641, 2014.
K. L. McMillan. Applications of craig interpolants in model checking. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 1-12, 2005.
R. McNaughton. Algebraic decision procedures for local testability. Mathematical Systems Theory, 8(1):60-76, 1974.
R. McNaughton and S. Papert. Counter-free automata. The M.I.T. Press, 1971.
R. Paige and R. Tarjan. Three parition refinement algorithms. SIAM Journal on Computing, 16:973-989, 1987.
T. Place, L. van Rooijen, and M. Zeitoun. Separating regular languages by locally testable and locally threshold testable languages. In Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pages 363-375, 2013.
T. Place, L. van Rooijen, and M. Zeitoun. Separating regular languages by piecewise testable and unambiguous languages. In Mathematical Foundations of Computer Science (MFCS), pages 729-740, 2013.
T. Place and M. Zeitoun. Separating regular languages with first-order logic. In Computer Science Logic - Logic in Computer Science (CSL-LICS), 2014.
L. Van Rooijen. Une approche combinatoire du problème de séparation pour les langages réguliers. PhD thesis, Université de Bordeaux, 2014.
S. Roy and D. Suciu. A formal approach to finding explanations for database queries. In International Conference on Management of Data (SIGMOD), pages 1579-1590, 2014.
I. Simon. Hierarchies of Events with Dot-Depth One. PhD thesis, Dept. of Applied Analysis and Computer Science, University of Waterloo, Canada, 1972.
I. Simon. Piecewise testable events. In Proceedings of GI Conference on Automata Theory and Formal Languages, pages 214-222. Springer, 1975.
J. Stern. Complexity of some problems from the theory of automata. Information and Control, 66(3):163-176, 1985.
L. Stockmeyer and A. Meyer. Word problems requiring exponential time: Preliminary report. In Symposium on Theory of Computing (STOC), pages 1-9, 1973.
W. C. Tan. Provenance in databases: Past, current, and future. IEEE Data Engineering Bulletin, 30(4):3-12, 2007.
Š. Holub, G. Jirśková, and T. Masopust. On upper and lower bounds on the length of alternating towers. In Mathematical Foundations of Computer Science (MFCS), Part I, pages 315-326, 2014.
P. T. Wood. Containment for XPath fragments under DTD constraints. In International Conference on Database Theory (ICDT), 2003. Full version, obtained through personal communication.
Y. Zalcstein. Locally testable languages. Journal of Computer and System Sciences, 6(2):151-167, 1972.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Process-Centric Views of Data-Driven Business Artifacts
Declarative, data-aware workflow models are becoming increasingly pervasive. While these have numerous benefits, classical process-centric specifications retain certain advantages. Workflow designers are used to development tools such as BPMN or UML diagrams, that focus on control flow. Views describing valid sequences of tasks are also useful to provide stake-holders with high-level descriptions of the workflow, stripped of the accompanying data. In this paper we study the problem of recovering process-centric views from declarative, data-aware workflow specifications in a variant of IBM's business artifact model. We focus on the simplest and most natural process-centric views, specified by finite-state transition systems, and describing regular languages. The results characterize when process-centric views of artifact systems are regular, using both linear and branching-time semantics. We also study the impact of data dependencies on regularity of the views.
Workflows
data-aware
process-centric
views
247-264
Regular Paper
Adrien
Koutsos
Adrien Koutsos
Victor
Vianu
Victor Vianu
10.4230/LIPIcs.ICDT.2015.247
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley, 1995.
F. Belardinelli, A. Lomuscio, and F. Patrizi. An abstraction technique for the verification of artifact-centric systems. In Proc. Intl. Conf. on Knowledge Representation, 2012.
K. Bhattacharya, N. S. Caswell, S. Kumaran, A. Nigam, and F. Y. Wu. Artifact-centered operational modeling: Lessons from customer engagements. IBM Sys. Journal, 46(4), 2007.
K. Bhattacharya et al. A model-driven approach to industrializing discovery processes in pharmaceutical research. IBM Systems Journal, 44(1), 2005.
BizAgi and Cordys and IBM and Oracle and SAP AG and Singularity (OMG Submitters) and Agile Enterprise Design and Stiftelsen SINTEF and TIBCO and Trisotech (Co-Authors). Case Management Model and Notation (CMMN), FTF Beta 1, Jan. 2013. OMG Document Number dtc/2013-01-01, Object Management Group.
L. Boasson and M. Nivat. Adherences of languages. J. Comput. System Sci., 20(3), 1980.
A. Bozzon, M. Brambilla, S. Ceri, and A. Mauri. Reactive crowdsourcing. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13-17, 2013, pages 153-164, 2013.
A. Bozzon, M. Brambilla, S. Ceri, A. Mauri, and R. Volonterio. Pattern-based specification of crowdsourcing applications. In Web Engineering, 14th International Conference, ICWE 2014, Toulouse, France, July 1-4, 2014. Proceedings, pages 218-235, 2014.
T. Chao et al. Artifact-based transformation of IBM Global Financing: A case study. In BPM, 2009.
E.M. Clarke, O. Grumberg, and D.A. Peled. Model Checking. MIT Press, 2000.
E. Damaggio, A. Deutsch, and V. Vianu. Artifact systems with data dependencies and arithmetic. ACM Transactions on Database Systems, 37(3), 2012. Preliminary version in ICDT 2011.
E. Damaggio, R. Hull, and R. Vaculín. On the equivalence of incremental and fixpoint semantics for business artifacts with guard-stage-milestone lifecycles. Information Systems, 38:561-584, 2013.
G. De Giacomo, R. De Masellis, and R. Rosati. Verification of conjunctive artifact-centric services. Int. J. Cooperative Inf. Syst., 21(2):111-140, 2012.
H. de Man. Case management: Cordys approach. BP Trends ( www.bptrends.com), 2009.
S. Demri and R. Lazić. LTL with the Freeze Quantifier and Register Automata. In LICS, 2006.
S. Demri, R. Lazić, and A. Sangnier. Model checking freeze LTL over one-counter automata. In FoSSaCS, 2008.
A. Deutsch, R. Hull, F. Patrizi, and V. Vianu. Automatic verification of data-centric business processes. In ICDT, 2009.
A. Deutsch, R. Hull, and V. Vianu. Automatic verification of data-driven systems. Sigmod Record, 2014.
A. Deutsch, Y. Li, and V. Vianu. Hierarchical artifact systems. In preparation.
A. Deutsch, L. Sui, and V. Vianu. Specification and verification of data-driven web applications. JCSS, 73(3):442-474, 2007.
E. Allen Emerson. Temporal and modal logic. In J. Van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics, pages 995-1072. North-Holland Pub. Co./MIT Press, 1990.
B. Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification of relational data-centric dynamic systems with external services. In PODS, 2013.
R. Hull, E. Damaggio, R. De Masellis, F. Fournier, M. Gupta, F. Heath III, S. Hobson, M. Linehan, S. Maradugu, A. Nigam, P. Sukaviriya, and R. Vaculín. Business artifacts with guard-stage-milestone lifecycles: Managing artifact interactions with conditions and events. In ACM DEBS, 2011.
Dimitri Isaak and Christof Löding. Efficient inclusion testing for simple classes of unambiguous -automata. Inf. Process. Lett., 112(14-15), 2012.
S. Kumaran, P. Nandi, T. Heath, K. Bhaskaran, and R. Das. ADoc-oriented programming. In Symp. on Applications and the Internet (SAINT), 2003.
Leonid Libkin. Elements of Finite Model Theory. Springer, 2004.
A. Lomuscio and J. Michaliszyn. Model checking unbounded artifact-centric systems. In Principles of Knowledge Representation and Reasoning: Proceedings of the Fourteenth International Conference, KR 2014, Vienna, Austria, July 20-24, 2014, 2014.
M. Marin, R. Hull, and R. Vaculín. Data centric BPM and the emerging case management standard: A short survey. In BPM Workshops, 2012.
S. Merz. Model checking: a tutorial overview. In Modeling and verification of parallel processes. Springer-Verlag New York, 2001.
Marvin L. Minsky. Computation: finite and infinite machines. Prentice-Hall, 1967.
A. Nigam and N. S. Caswell. Business artifacts: An approach to operational specification. IBM Systems Journal, 42(3), 2003.
Amir Pnueli. The temporal logic of programs. In FOCS, 1977.
E. L. Post. Recursive unsolvability of a problem of Thue. J. of Symbolic Logic, 12:1-11, 1947.
L. Segoufin and S. Torunczyk. Automata based verification over linearly ordered data domains. In STACS, 2011.
M. Spielmann. Verification of relational transducers for electronic commerce. JCSS., 66(1):40-65, 2003.
Wolfgang Thomas. Automata on infinite objects. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science (Vol. B). Elsevier, 1990.
W. van der Aalst and M. Song. Mining social networks: Uncovering interaction patterns in business processes. In Business Process Management, volume 3080 of Lecture Notes in Computer Science, pages 244-260. Springer Berlin Heidelberg, 2004.
W. van der Aalst and A. ter Hofstede. YAWL: Yet another workflow language. Information Systems, 30(4), 2005.
W.-D. Zhu et al. Advanced Case Management with IBM Case Manager. Available at \tt http://www.redbooks.ibm.com/abstracts/sg247929.html?Open.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
On The I/O Complexity of Dynamic Distinct Counting
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size.
In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).
distinct counting
lower bound
external memory
265-276
Regular Paper
Xiaocheng
Hu
Xiaocheng Hu
Yufei
Tao
Yufei Tao
Yi
Yang
Yi Yang
Shengyu
Zhang
Shengyu Zhang
Shuigeng
Zhou
Shuigeng Zhou
10.4230/LIPIcs.ICDT.2015.265
Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM (CACM), 31(9):1116-1127, 1988.
Lars Arge, Mikael Knudsen, and Kirsten Larsen. A general lower bound on the I/O-complexity of comparison-based algorithms. In Algorithms and Data Structures Workshop (WADS), pages 83-94, 1993.
Lars Arge and Peter Bro Miltersen. On showing lower bounds for external-memory computational geometry problems. DIMACS Series in Discrete Mathematics, pages 139-159, 1999.
Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546-554, 2003.
Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. Sparse and lopsided set disjointness via information theory. In APPROX-RANDOM, pages 517-528, 2012.
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Patrascu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Latin American Symposium on Theoretical Informatics (LATIN), pages 349-361, 2006.
Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 345-354, 1989.
Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, and Vasilis Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM (JACM), 49(1):35-55, 2002.
John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 570-582, 2012.
Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 41-52, 2010.
Mihai Patrascu and Erik D. Demaine. Lower bounds for dynamic connectivity. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 546-553, 2004.
Elad Verbin and Qin Zhang. The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 447-456, 2010.
Zhewei Wei, Ke Yi, and Qin Zhang. Dynamic external hashing: the limit of buffering. In Proceedings of Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 253-259, 2009.
Ke Yi. Dynamic indexability and the optimality of B-trees. Journal of the ACM (JACM), 59(4), 2012.
Ke Yi and Qin Zhang. On the cell probe complexity of dynamic membership. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 123-133, 2010.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Shared-Constraint Range Reporting
Orthogonal range reporting is one of the classic and most fundamental data structure problems. (2,1,1) query is a 3 dimensional query with two-sided constraint on the first dimension and one sided constraint on each of the 2nd and 3rd dimension. Given a set of N points in three dimension, a particular formulation of such a (2,1,1) query (known as four-sided range reporting in three-dimension) asks to report all those K points within a query region [a, b]X(-infinity, c]X[d, infinity). These queries have overall 4 constraints. In Word-RAM model, the best known structure capable of answering such queries with optimal query time takes O(N log^{epsilon} N) space, where epsilon>0 is any positive constant. It has been shown that any external memory structure in optimal I/Os must use Omega(N log N/ log log_B N) space (in words), where B is the block size [Arge et al., PODS 1999]. In this paper, we study a special type of (2,1,1) queries, where the query parameters a and c are the same i.e., a=c. Even though the query is still four-sided, the number of independent constraints is only three. In other words, one constraint is shared. We call this as a Shared-Constraint Range Reporting (SCRR) problem. We study this problem in both internal as well as external memory models. In RAM model where coordinates can only be compared, we achieve linear-space and O(log N+K) query time solution, matching the best-known three dimensional dominance query bound. Whereas in external memory, we present a linear space structure with O(log_B N + log log N + K/B) query I/Os. We also present an I/O-optimal (i.e., O(log_B N+K/B) I/Os) data structure which occupies O(N log log N)-word space. We achieve these results by employing a novel divide and conquer approach. SCRR finds application in database queries containing sharing among the constraints. We also show that SCRR queries naturally arise in many well known problems such as top-k color reporting, range skyline reporting and ranked document retrieval.
data structure
shared constraint
multi-slab
point partitioning
277-290
Regular Paper
Sudip
Biswas
Sudip Biswas
Manish
Patil
Manish Patil
Rahul
Shah
Rahul Shah
Sharma V.
Thankachan
Sharma V. Thankachan
10.4230/LIPIcs.ICDT.2015.277
Peyman Afshani. On dominance reporting in 3d. In ESA, pages 41-51, 2008.
Peyman Afshani, Lars Arge, and Kasper Dalgaard Larsen. Orthogonal range reporting in three and higher dimensions. In FOCS, pages 149-158, 2009.
Peyman Afshani, Lars Arge, and Kasper Dalgaard Larsen. Orthogonal range reporting: query lower bounds, optimal structures in 3-d, and higher-dimensional improvements. In Symposium on Computational Geometry, pages 240-246, 2010.
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. New data structures for orthogonal range searching. In FOCS, pages 198-207, 2000.
Lars Arge, Mark de Berg, Herman J. Haverkort, and Ke Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. In SIGMOD Conference, pages 347-358, 2004.
Lars Arge, Vasilis Samoladas, and Jeffrey Scott Vitter. On two-dimensional indexability and optimal range search indexing. In PODS, pages 346-357, 1999.
Jon Louis Bentley. Multidimensional divide-and-conquer. Commun. ACM, 23(4):214-229, 1980.
Gerth Stølting Brodal and Kasper Green Larsen. Optimal planar orthogonal skyline counting queries. CoRR, abs/1304.7959, 2013.
Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. Orthogonal range searching on the ram, revisited. In Symposium on Computational Geometry, pages 1-10, 2011.
Bernard Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427-462, 1988.
Bernard Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput., 17(3):427-462, 1988.
Bernard Chazelle. Lower bounds for orthogonal range searching i. the reporting case. J. ACM, 37(2):200-212, 1990.
Bernard Chazelle. Lower bounds for orthogonal range searching ii. the arithmetic model. J. ACM, 37(3):439-463, 1990.
Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Space-efficient frameworks for top-k string retrieval. J. ACM, 61(2):9, 2014.
Marek Karpinski and Yakov Nekrich. Top-k color queries for document retrieval. In SODA, pages 401-411, 2011.
Casper Kejlberg-Rasmussen, Yufei Tao, Konstantinos Tsakalidis, Kostas Tsichlas, and Jeonghun Yoon. I/o-efficient planar range skyline and attrition priority queues. In PODS, pages 103-114, 2013.
Kasper Green Larsen and Rasmus Pagh. I/o-efficient data structures for colored range and prefix reporting. In SODA, pages 583-592, 2012.
S. Muthukrishnan. Efficient algorithms for document retrieval problems. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 657-666, 2002.
Yakov Nekrich. External memory range reporting on a grid. In ISAAC, pages 525-535, 2007.
Darren Erik Vengroff and Jeffrey Scott Vitter. Efficient 3-d range searching in external memory. In STOC, pages 192-201, 1996.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data
In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.
Coordination-free evaluation
conjunctive queries
broadcasting
291-307
Regular Paper
Bas
Ketsman
Bas Ketsman
Frank
Neven
Frank Neven
10.4230/LIPIcs.ICDT.2015.291
F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In International Conference on Extending Database Technology (EDBT), pages 99-110, 2010.
Foto N. Afrati, Paraschos Koutris, Dan Suciu, and Jeffrey D. Ullman. Parallel skyline queries. In International Conference on Database Theory (ICDT), pages 274-284, 2012.
Peter Alvaro, Neil Conway, Joe Hellerstein, and William R. Marczak. Consistency analysis in bloom: a CALM and collected approach. In Conference on Innovative Data Systems Research (CIDR), pages 249-260, 2011.
Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and David Maier. Blazes: Coordination analysis for distributed programs. In International Conference on Data Engineering (ICDE), pages 52-63. IEEE, 2014.
Tom J. Ameloot, Bas Ketsman, Frank Neven, and Daniel Zinn. Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. In Symposium on Principles of Database Systems (PODS), pages 64-75. ACM, 2014.
Tom J. Ameloot, Frank Neven, and Jan Van den Bussche. Relational transducers for declarative networking. Journal of the ACM, 60(2):15, 2013.
Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Symposium on Principles of Database Systems (PODS), pages 273-284, 2013.
Paul Beame, Paraschos Koutris, and Dan Suciu. Skew in parallel query processing. In Symposium on Principles of Database Systems (PODS), pages 212-223, 2014.
Peter Buneman, James Cheney, Wang Chiew Tan, and Stijn Vansummeren. Curated databases. In Symposium on Principles of Database Systems (PODS), pages 1-12. ACM, 2008.
Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. Why and where: A characterization of data provenance. In International Conference on Database Theory (ICDT), volume 1973 of Lecture Notes in Computer Science, pages 316-330. Springer, 2001.
Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier. Logic and lattices for distributed programming. In Symposium on Cloud Computing (SoCC), page 1. ACM, 2012.
Wenfei Fan, Floris Geerts, and Leonid Libkin. On scale independence for querying big data. In Symposium on Principles of Database Systems (PODS), pages 51-62. ACM, 2014.
Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. Parallel bottom-up processing of datalog queries. Journal of Logic Programming, 14(1&2):101-126, 1992.
Joseph M. Hellerstein. The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Record, 39(1):5-19, 2010.
Paraschos Koutris and Dan Suciu. Parallel evaluation of conjunctive queries. In Symposium on Principles of Database Systems (PODS), pages 223-234, 2011.
Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu. Causality in databases. IEEE Data Engineering Bulletin, 33(3):59-67, 2010.
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. The complexity of causality and responsibility for query answers and non-answers. Proceedings of the VLDB Endowmen (PVLDB), 4(1):34-45, 2010.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 15-28. USENIX Association, 2012.
Daniel Zinn, Todd J. Green, and Bertram Ludäscher. Win-move is coordination-free (sometimes). In International Conference on Database Theory (ICDT), pages 99-113, 2012.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Datalog Queries Distributing over Components
We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog with Negation (the fragment of Datalog with Negation where all rules are connected) provides an effective syntax for Datalog with Negation programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work [Zinn, Green, and Ludäscher, ICDT2012], namely, that the classic win-move query is in F_2 (a particular class of coordination-free queries).
Datalog
stratified semantics
well-founded semantics
coordination-free evaluation
distributed databases
308-323
Regular Paper
Tom J.
Ameloot
Tom J. Ameloot
Bas
Ketsman
Bas Ketsman
Frank
Neven
Frank Neven
Daniel
Zinn
Daniel Zinn
10.4230/LIPIcs.ICDT.2015.308
S. Abiteboul, Z. Abrams, S. Haar, and T. Milo. Diagnosis of asynchronous discrete event systems: Datalog to the rescue! In PODS, pages 358-367. ACM Press, 2005.
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
P. Alvaro, N. Conway, J.M. Hellerstein, and D. Maier. Blazes: Coordination analysis for distributed programs. In IEEE 30th International Conference on Data Engineering, pages 52-63. IEEE, 2014.
T.J. Ameloot, B. Ketsman, F. Neven, and D. Zinn. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the CALM-conjecture. In PODS, pages 64-75. ACM Press, 2014.
T.J. Ameloot, F. Neven, and J. Van den Bussche. Relational transducers for declarative networking. J. ACM, 60(2):15:1-15:38, 2013.
L. Cabibbo. The expressive power of stratified logic programs with value invention. Information and Computation, 147(1):22-56, 1998.
K.J. Compton. Some useful preservation theorems. Journal of Symbolic Logic, 48:427-440, 1983.
N. Conway, W.R. Marczak, P. Alvaro, J.M. Hellerstein, and D. Maier. Logic and lattices for distributed programming. In Proceedings of the Third ACM Symposium on Cloud Computing, pages 1:1-1:14. ACM Press, 2012.
A. Dawar and S. Kreutzer. On Datalog vs. LFP. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming, pages 160-171. Springer, 2008.
T. Feder and M.Y. Vardi. Homomorphism closed vs. existential positive. In LICS, pages 311-320. IEEE Computer Society, 2003.
I. Guessarian. Deciding boundedness for uniformly connected datalog programs. In S. Abiteboul and P.C. Kanellakis, editors, ICDT, volume 470 of Lecture Notes in Computer Science, pages 395-405. Springer, 1990.
J.M. Hellerstein. The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Record, 39(1):5-19, 2010.
R. Hull and M. Yoshikawa. ILOG: Declarative creation and manipulation of object identifiers. In VLDB, pages 455-468. Morgan Kaufmann Publishers Inc., 1990.
T. Jim and D. Suciu. Dynamically distributed query evaluation. In PODS, pages 28-39. ACM Press, 2001.
D.B. Kemp, D. Srivastava, and P.J. Stuckey. Bottom-up evaluation and query optimization of well-founded models. Theor. Comput. Sci., 146(1&2):145-184, 1995.
B.T. Loo, T. Condie, M. Garofalakis, D.E. Gay, J.M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking: Language, execution and optimization. In SIGMOD, pages 97-108. ACM Press, 2006.
O. Shmueli. Equivalence of Datalog queries is undecidable. The Journal of Logic Programming, 15(3):231-241, 1993.
A. Van Gelder. The alternating fixpoint of logic programs with negation. J. Comput. Syst. Sci., 47(1):185-221, 1993.
D. Zinn, T.J. Green, and B. Ludäscher. Win-move is coordination-free (sometimes). In ICDT, pages 99-113. ACM Press, 2012.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
Distributed Streaming with Finite Memory
We introduce three formal models of distributed systems for query evaluation on massive databases: Distributed Streaming with Register Automata (DSAs), Distributed Streaming with Register Transducers (DSTs), and Distributed Streaming with Register Transducers and Joins (DSTJs). These models are based on the key-value paradigm where the input is transformed into a dataset of key-value pairs, and on each key a local computation is performed on the values associated with that key resulting in another set of key-value pairs. Computation proceeds in a constant number of rounds, where the result of the last round is the input to the next round, and transformation to key-value pairs is required to be generic. The difference between the three models is in the local computation part. In DSAs it is limited to making one pass over its input using a register automaton, while in DSTs it can make two passes: in the first pass it uses a finite-state automaton and in the second it uses a register transducer. The third model DSTJs is an extension of DSTs, where local computations are capable of constructing the Cartesian product of two sets. We obtain the following results: (1) DSAs can evaluate first-order queries over bounded degree databases; (2) DSTs can evaluate semijoin algebra queries over arbitrary databases; (3) DSTJs can evaluate the whole relational algebra over arbitrary databases; (4) DSTJs are strictly stronger than DSTs, which in turn, are strictly stronger than DSAs; (5) within DSAs, DSTs and DSTJs there is a strict hierarchy w.r.t. the number of rounds.
distributed systems
relational algebra
semijoin algebra
register automata
register transducers.
324-341
Regular Paper
Frank
Neven
Frank Neven
Nicole
Schweikardt
Nicole Schweikardt
Frédéric
Servais
Frédéric Servais
Tony
Tan
Tony Tan
10.4230/LIPIcs.ICDT.2015.324
F. Afrati, V. Borkar, M. Carey, N. Polyzotis, and J. Ullman. Map-reduce extensions and recursive queries. In ICDE, 2011.
F. Afrati, D. Fotakis, and J. Ullman. Enumerating subgraph instances using map-reduce. In ICDE, 2013.
F. Afrati, P. Koutris, D. Suciu, and J. Ullman. Parallel skyline queries. In ICDT, 2012.
F. Afrati, A. Dash Sarma, S. Salihoglu, and J. Ullman. Upper and lower bounds on the cost of a map-reduce computation. PVLDB, 6(4):277-288, 2013.
F. Afrati and J. Ullman. Optimizing joins in a map-reduce environment. In EDBT, 2010.
F. Afrati and J. Ullman. Transitive closure and recursive datalog implemented on clusters. In EDBT, 2012.
T. Ameloot, F. Neven, and J. Van den Bussche. Relational transducers for declarative networking. Journal of the ACM, 60(2):15, 2013.
Apache Bagel. Bagel. http://spark.apache.org/docs/0.7.3/bagel-programming-guide.html.
P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In PODS, 2013.
P. Beame, P. Koutris, and D. Suciu. Skew in parallel query processing. In PODS, 2014.
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, 2010.
E. Codd. A relational model of data for large shared data banks. Communication of the ACM, 13(6):377-387, 1970.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Communication of the ACM, 53(1):72-77, 2010.
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB, 2(2):1414-1425, 2009.
J. Hellerstein. The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Record, 39(1):5-19, 2010.
M. Kaminski and N. Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329-363, 1994.
H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, 2010.
P. Koutris and D. Suciu. Parallel evaluation of conjunctive queries. In PODS, 2011.
R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. In SPAA, 2013.
S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, 2011.
F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alphabets. ACM Transactions on Computational Logic, 5(3):403-435, 2004.
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD Conference, 2008.
Apache Pig. Pig. URL: http://pig.apache.org/.
http://pig.apache.org/
Apache Spark. Spark. URL: http://spark.apache.org.
http://spark.apache.org
Apache Spark. Spark programming guide. URL: http://spark.apache.org/docs/latest/programming-guide.html.
http://spark.apache.org/docs/latest/programming-guide.html
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, 2011.
Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. In SIGMOD, 2013.
A. Thusoo, J. Sen Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, 2010.
L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103-111, 1990.
T. White. Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (3. ed., revised and updated). O'Reilly, 2012.
R. Xin, J. Rosen, M. Zaharia, M. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In SIGMOD, 2013.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problem of determining actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition-inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.
causality,diagnosis,repairs,consistent query answering,integrity constraints
342-362
Regular Paper
Babak
Salimi
Babak Salimi
Leopoldo
Bertossi
Leopoldo Bertossi
10.4230/LIPIcs.ICDT.2015.342
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
On the Relationship between Consistent Query Answering and Constraint Satisfaction Problems
Recently, Fontaine has pointed out a connection between consistent query answering (CQA) and constraint satisfaction problems (CSP) [Fontaine, LICS 2013]. We investigate this connection more closely, identifying classes of CQA problems based on denial constraints and GAV constraints that correspond exactly to CSPs in the sense that a complexity classification of the CQA problems in each class is equivalent (up to FO-reductions) to classifying the complexity of all CSPs. We obtain these classes by admitting only monadic relations and only a single variable in denial constraints/GAVs and restricting queries to hypertree UCQs. We also observe that dropping the requirement of UCQs to be hypertrees corresponds to transitioning from CSP to its logical generalization MMSNP and identify a further relaxation that corresponds to transitioning from MMSNP to GMSNP (also know as MMSNP_2). Moreover, we use the CSP connection to carry over decidability of FO-rewritability and Datalog-rewritability to some of the identified classes of CQA problems.
Consistent Query Answering
Constraint Satisfaction
Data Complexity
Dichotomies
Rewritability
363-379
Regular Paper
Carsten
Lutz
Carsten Lutz
Frank
Wolter
Frank Wolter
10.4230/LIPIcs.ICDT.2015.363
Foto N. Afrati and Phokion G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In Proc. of ICDT, volume 361 of ACM International Conference Proceeding Series, pages 31-41. ACM, 2009.
Marcelo Arenas and Leopoldo E. Bertossi. On the decidability of consistent query answering. In Proc. of AMW, volume 619 of CEUR Workshop Proceedings. CEUR-WS.org, 2010.
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In Proc. of PODS, pages 68-79. ACM Press, 1999.
Albert Atserias. On digraph coloring problems and treewidth duality. In Proc. of LICS, pages 106-115, 2005.
Libor Barto and Marcin Kozik. Constraint satisfaction problems of bounded width. In Proc. of FOCS, pages 595-603, 2009.
Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3):479-513, 1983.
Leopoldo E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68-76, 2006.
Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, and Andrei Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf. Syst., 33(4-5):407-434, 2008.
Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. First-order rewritability of atomic queries in Horn description logics. In Proc. of IJCAI. IJCAI/AAAI, 2013.
Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology-based data access: a study through disjunctive datalog, CSP, and MMSNP. In Proc. of PODS, pages 213-224. ACM, 2013.
Manuel Bodirsky, Hubie Chen, and Tomás Feder. On the complexity of MMSNP. SIAM J. Discrete Math., 26(1):404-414, 2012.
Manuel Bodirsky and Florent Madeleine. Feder and Vardi’s logic revisited. In preparation.
Andrei A. Bulatov. A dichotomy theorem for constraint satisfaction problems on a 3-element set. J. ACM, 53(1):66-120, 2006.
Andrea Calì, Domenico Lembo, and Riccardo Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In Proc. of PODS, pages 260-271. ACM, 2003.
Jan Chomicki. Consistent query answering: Five easy pieces. In Proc. of ICDT, volume 4353 of LNCS, pages 1-17. Springer, 2007.
Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90-121, 2005.
Jan Chomicki, Jerzy Marcinkowski, and Slawomir Staworko. Computing consistent query answers using conflict hypergraphs. In Proc. of CIKM, pages 417-426. ACM, 2004.
David Cohen and Peter Jeavons. The complexity of constraint languages, chapter 8. Elsevier, 2006.
Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable optimization problems for database logic programs (preliminary report). In Proc. of STOC, pages 477-490. ACM, 1988.
Tomás Feder and Moshe Y. Vardi. The computational structure of monotone monadic SNP and constraint satisfaction: A study through datalog and group theory. SIAM J. Comput., 28(1):57-104, 1998.
Tomás Feder and Moshe Y. Vardi. Homomorphism closed vs. existential positive. In 18th IEEE Symposium on Logic in Computer Science (LICS 2003), 22-25 June 2003, Ottawa, Canada, Proceedings, pages 311-320, 2003.
Gaëlle Fontaine. Why is it hard to obtain a dichotomy for consistent query answering? In Proc. of LICS, pages 550-559. IEEE Computer Society, 2013.
Ralph Freese, Marcin Kozik, Andrei Krokhin, Miklós Maróti, Ralph KcKenzie, and Ross Willard. On Maltsev conditions associated with omitting certain types of local structures. In preparation. Manuscript available from URL: http://www.math.hawaii.edu/~ralph/Classes/619/OmittingTypesMaltsev.pdf.
http://www.math.hawaii.edu/~ralph/Classes/619/OmittingTypesMaltsev.pdf
Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. In Proc. of ICDT, volume 3363 of LNCS, pages 337-351. Springer, 2005.
Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci., 64(3):579-627, 2002.
Neil Immerman. Descriptive complexity. Springer, 1999.
Phokion G. Kolaitis and Enela Pema. A dichotomy in the complexity of consistent query answering for queries with two atoms. Inf. Process. Lett., 112(3):77-85, 2012.
Paraschos Koutris and Dan Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In Proc. of ICDT, pages 165-176. OpenProceedings.org, 2014.
Gábor Kun. Constraints, MMSNP and expander relational structures. Combinatorica, 33(3):335-347, 2013.
Benoit Larose, Cynthia Loten, and Claude Tardif. A characterisation of first-order constraint satisfaction problems. Logical Methods in Computer Science, 3(4), 2007.
Benoit Larose and Pascal Tesson. Universal algebra and hardness results for constraint satisfaction problems. Theor. Comput. Sci., 410(18):1629-1647, 2009.
Florent R. Madelaine. Universal structures and the logic of forbidden patterns. Logical Methods in Computer Science, 5(2), 2009.
Florent R. Madelaine and Iain A. Stewart. Constraint satisfaction, logic and forbidden patterns. SIAM J. Comput., 37(1):132-163, 2007.
Jaroslav Nesetril. Many facets of dualities. In Bonn Workshop of Combinatorial Optimization, pages 285-302, 2008.
Benjamin Rossman. Homomorphism preservation theorems. J. ACM, 55(3), 2008.
Thomas J. Schaefer. The complexity of satisfiability problems. In Proc. of STOC, pages 216-226. ACM, 1978.
Slawomir Staworko and Jan Chomicki. Consistent query answers in the presence of universal constraints. Inf. Syst., 35(1):1-22, 2010.
Balder ten Cate, Gaëlle Fontaine, and Phokion G. Kolaitis. On the data complexity of consistent query answering. In Proc. if ICDT, pages 22-33. ACM, 2012.
Jef Wijsen. On the first-order expressibility of computing certain answers to conjunctive queries over uncertain databases. In Proc. of PODS, pages 179-190. ACM, 2010.
Jef Wijsen. Charting the tractability frontier of certain conjunctive query answering. In Proc. of PODS, pages 189-200. ACM, 2013.
Jef Wijsen. A survey of the data complexity of consistent query answering under key constraints. In Proc. of FoIKS, volume 8367 of LNCS, pages 62-78. Springer, 2014.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode
On the Data Complexity of Consistent Query Answering over Graph Databases
Areas in which graph databases are applied - such as the semantic web, social networks and scientific databases - are prone to inconsistency, mainly due to interoperability issues. This raises the need for understanding query answering over inconsistent graph databases in a framework that is simple yet general enough to accommodate many of its applications. We follow the well-known approach of consistent query answering (CQA), and study the data complexity of CQA over graph databases for regular path queries (RPQs) and regular path constraints (RPCs), which are frequently used. We concentrate on subset, superset and symmetric difference repairs. Without further restrictions, CQA is undecidable for the semantics based on superset and symmetric difference repairs, and Pi_2^P-complete for subset repairs. However, we provide several tractable restrictions on both RPCs and the structure of graph databases that lead to decidability, and even tractability of CQA. We also compare our results with those obtained for CQA in the context of relational databases.
graph databases
regular path queries
consistent query answering
description logics
rewrite systems
380-397
Regular Paper
Pablo
Barceló
Pablo Barceló
Gaëlle
Fontaine
Gaëlle Fontaine
10.4230/LIPIcs.ICDT.2015.380
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode