Distribution Constraints: The Chase for Distributed Data

Authors Gaetano Geck , Frank Neven , Thomas Schwentick



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2020.13.pdf
  • Filesize: 0.68 MB
  • 19 pages

Document Identifiers

Author Details

Gaetano Geck
  • TU Dortmund University, Germany
Frank Neven
  • Hasselt University and transnational University of Limburg, Belgium
Thomas Schwentick
  • TU Dortmund University, Germany

Acknowledgements

We thank Michael Benedikt, Bas Ketsman, Andreas Pieris, Phokion Kolaitis, Christopher Spinrath, Brecht Vandevoort, and Thomas Zeume for helpful discussions on various aspects of this work.

Cite AsGet BibTex

Gaetano Geck, Frank Neven, and Thomas Schwentick. Distribution Constraints: The Chase for Distributed Data. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 13:1-13:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICDT.2020.13

Abstract

This paper introduces a declarative framework to specify and reason about distributions of data over computing nodes in a distributed setting. More specifically, it proposes distribution constraints which are tuple and equality generating dependencies (tgds and egds) extended with node variables ranging over computing nodes. In particular, they can express co-partitioning constraints and constraints about range-based data distributions by using comparison atoms. The main technical contribution is the study of the implication problem of distribution constraints. While implication is undecidable in general, relevant fragments of so-called data-full constraints are exhibited for which the corresponding implication problems are complete for EXPTIME, PSPACE and NP. These results yield bounds on deciding parallel-correctness for conjunctive queries in the presence of distribution constraints.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database constraints theory
  • Theory of computation → Logic and databases
  • Information systems → Parallel and distributed DBMSs
Keywords
  • tuple-generating dependencies
  • chase
  • conjunctive queries
  • distributed evaluation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. TPC-H. URL: http://www.tpc.org/tpch/.
  2. Serge Abiteboul, Émilien Antoine, and Julia Stoyanovich. The Webdamlog System Managing Distributed Knowledge on the Web. CoRR, abs/1304.4187, 2013. URL: http://arxiv.org/abs/1304.4187.
  3. Serge Abiteboul, Meghyn Bienvenu, Alban Galland, and Émilien Antoine. A rule-based language for web data management. In PODS, pages 293-304, 2011. Google Scholar
  4. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  5. Foto N. Afrati, Chen Li, and Vassia Pavlaki. Data exchange in the presence of arithmetic comparisons. In EDBT, pages 487-498, 2008. Google Scholar
  6. Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, and Russell Sears. Dedalus: Datalog in Time and Space. In Datalog Reloaded, pages 262-281, 2010. Google Scholar
  7. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-Correctness and Transferability for Conjunctive Queries. J. ACM, 64(5):36:1-36:38, 2017. URL: https://doi.org/10.1145/3106412.
  8. Marcelo Arenas, Pablo Barceló, Leonid Libkin, and Filip Murlak. Foundations of Data Exchange. Cambridge University Press, 2014. URL: http://www.cambridge.org/9781107016163.
  9. Alessandro Artale, Roman Kontchakov, Alisa Kovtunova, Vladislav Ryzhikov, Frank Wolter, and Michael Zakharyaschev. Ontology-Mediated Query Answering over Temporal Data: A Survey (Invited Talk). In TIME, pages 1:1-1:37, 2017. Google Scholar
  10. Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo. Walking the Complexity Lines for Generalized Guarded Existential Rules. In IJCAI, pages 712-717, 2011. Google Scholar
  11. Marianne Baudinet, Jan Chomicki, and Pierre Wolper. Constraint-Generating Dependencies. J. Comput. Syst. Sci., 59(1):94-115, 1999. URL: https://doi.org/10.1006/jcss.1999.1632.
  12. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication Steps for Parallel Query Processing. J. ACM, 64(6):40:1-40:58, 2017. Google Scholar
  13. Catriel Beeri and Moshe Y. Vardi. The Implication Problem for Data Dependencies. In ICALP, pages 73-85, 1981. Google Scholar
  14. Michael Benedikt. How Can Reasoners Simplify Database Querying (And Why Haven't They Done It Yet)? In PODS, pages 1-15, 2018. Google Scholar
  15. Gerald Berger, Georg Gottlob, Andreas Pieris, and Emanuel Sallinger. The Space-Efficient Core of Vadalog. CoRR, abs/1809.05951, 2018. URL: http://arxiv.org/abs/1809.05951.
  16. Andrea Calì, Georg Gottlob, and Michael Kifer. Taming the Infinite Chase: Query Answering under Expressive Relational Constraints. J. Artif. Intell. Res., 48:115-174, 2013. URL: https://doi.org/10.1613/jair.3873.
  17. Andrea Calì, Georg Gottlob, and Andreas Pieris. Advanced Processing for Ontological Queries. PVLDB, 3(1):554-565, 2010. URL: https://doi.org/10.14778/1920841.1920912.
  18. Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. Inclusion Dependencies and Their Interaction with Functional Dependencies. J. Comput. Syst. Sci., 28(1):29-59, 1984. Google Scholar
  19. Ashok K. Chandra, Harry R. Lewis, and Johann A. Makowsky. Embedded Implicational Dependencies and their Inference Problem. In STOC, pages 342-354, 1981. Google Scholar
  20. Ashok K. Chandra and Philip M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77-90, 1977. Google Scholar
  21. D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. . Hsiao, and R. Rasmussen. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering, 2(1):44-62, 1990. Google Scholar
  22. Deming Dou and Stéphane Coulondre. A sound and complete chase procedure for constrained tuple-generating dependencies. J. Intell. Inf. Syst., 40(1):63-84, 2013. URL: https://doi.org/10.1007/s10844-012-0216-5.
  23. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89-124, 2005. URL: https://doi.org/10.1016/j.tcs.2004.10.033.
  24. Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. Composing Schema Mappings: Second-Order Dependencies to the Rescue. In Catriel Beeri and Alin Deutsch, editors, Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14-16, 2004, Paris, France, pages 83-94. ACM, 2004. URL: https://doi.org/10.1145/1055558.1055572.
  25. Shinya Fushimi, Masaru Kitsuregawa, and Hidehiko Tanaka. An Overview of The System Software of A Parallel Relational Database Machine GRACE. In VLDB, pages 209-219, 1986. Google Scholar
  26. Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation. In ICDT, 2016. Google Scholar
  27. Gaetano Geck, Frank Neven, and Thomas Schwentick. Distribution constraints: The chase for distributed data, 2020. URL: http://arxiv.org/abs/2003.00965.
  28. Georg Gottlob, Reinhard Pichler, and Vadim Savenkov. Normalization and optimization of schema mappings. VLDB J., 20(2):277-302, 2011. URL: https://doi.org/10.1007/s00778-011-0226-x.
  29. Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution Policies for Datalog. In ICDT, pages 17:1-17:22, 2018. Google Scholar
  30. Bas Ketsman and Dan Suciu. A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries. In PODS, pages 417-428, 2017. Google Scholar
  31. Martin Kleppmann. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly, 2016. URL: http://shop.oreilly.com/product/0636920032175.do.
  32. Yi Lu, Anil Shanbhag, Alekh Jindal, and Samuel Madden. AdaptDB: Adaptive Partitioning for Distributed Joins. PVLDB, 10(5):589-600, 2017. Google Scholar
  33. Michael J. Maher and Divesh Srivastava. Chasing Constrained Tuple-Generating Dependencies. In PODS, pages 128-138, 1996. Google Scholar
  34. Vera Zaychik Moffitt, Julia Stoyanovich, Serge Abiteboul, and Gerome Miklau. Collaborative Access Control in WebdamLog. In SIGMOD, pages 197-211, 2015. Google Scholar
  35. Frank Neven, Thomas Schwentick, Christopher Spinrath, and Brecht Vandevoort. Parallel-Correctness and Parallel-Boundedness for Datalog Programs. In ICDT, pages 14:1-14:19, 2019. Google Scholar
  36. Adrian Constantin Onet. The chase procedure and its applications. PhD Thesis, June 2012. URL: https://spectrum.library.concordia.ca/974476/.
  37. M. Tamer Özsu and Patrick Valduriez. Principles of Distributed Database Systems, Third Edition. Springer, 2011. Google Scholar
  38. Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, Alfons Kemper, and Thomas Neumann. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, pages 592-603, 2014. Google Scholar
  39. Bart Samwel, John Cieslewicz, Ben Handy, Jason Govig, Petros Venetis, Chanjun Yang, Keith Peters, Jeff Shute, Daniel Tenedorio, Himani Apte, Felix Weigel, David Wilhite, Jiacheng Yang, Jun Xu, Jiexing Li, Zhan Yuan, Craig Chasseur, Qiang Zeng, Ian Rae, Anurag Biyani, Andrew Harn, Yang Xia, Andrey Gubichev, Amr El-Helw, Orri Erling, Zhepeng Yan, Mohan Yang, Yiqun Wei, Thanh Do, Colin Zheng, Goetz Graefe, Somayeh Sardashti, Ahmed M. Aly, Divy Agrawal, Ashish Gupta, and Shivakumar Venkataraman. F1 query: Declarative querying at scale. PVLDB, 11(12):1835-1848, 2018. URL: http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf, URL: https://doi.org/10.14778/3229863.3229871.
  40. Marco Serafini, Rebecca Taft, Aaron J. Elmore, Andrew Pavlo, Ashraf Aboulnaga, and Michael Stonebraker. Clay: Fine-Grained Adaptive Partitioning for General Database Schemas. PVLDB, 10(4):445-456, 2016. Google Scholar
  41. Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. F1: A distributed SQL database that scales. PVLDB, 6(11):1068-1079, 2013. URL: https://doi.org/10.14778/2536222.2536232.
  42. Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey F. Naughton. Exploiting Data Partitioning To Provide Approximate Results. In BeyondMR@SIGMOD, pages 5:1-5:5, 2018. Google Scholar
  43. Balder ten Cate, Phokion G. Kolaitis, and Walied Othman. Data exchange with arithmetic operations. In EDBT, pages 537-548, 2013. Google Scholar
  44. Ron van der Meyden. The Complexity of Querying Indefinite Data about Linearly Ordered Domains. J. Comput. Syst. Sci., 54(1):113-135, 1997. URL: https://doi.org/10.1006/jcss.1997.1455.
  45. Erfan Zamanian, Carsten Binnig, and Abdallah Salama. Locality-aware Partitioning in Parallel Database Systems. In SIGMOD, pages 17-30, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail