Parallel-Correctness and Parallel-Boundedness for Datalog Programs

Authors Frank Neven, Thomas Schwentick, Christopher Spinrath, Brecht Vandevoort



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2019.14.pdf
  • Filesize: 0.58 MB
  • 19 pages

Document Identifiers

Author Details

Frank Neven
  • Hasselt University and transnational University of Limburg, The Netherlands
Thomas Schwentick
  • Dortmund University, Germany
Christopher Spinrath
  • Dortmund University, Germany
Brecht Vandevoort
  • Hasselt University and transnational University of Limburg, The Netherlands

Cite AsGet BibTex

Frank Neven, Thomas Schwentick, Christopher Spinrath, and Brecht Vandevoort. Parallel-Correctness and Parallel-Boundedness for Datalog Programs. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICDT.2019.14

Abstract

Recently, Ketsman et al. started the investigation of the parallel evaluation of recursive queries in the Massively Parallel Communication (MPC) model. Among other things, it was shown that parallel-correctness and parallel-boundedness for general Datalog programs is undecidable, by a reduction from the undecidable containment problem for Datalog. Furthermore, economic policies were introduced as a means to specify data distribution in a recursive setting. In this paper, we extend the latter framework to account for more general distributed evaluation strategies in terms of communication policies. We then show that the undecidability of parallel-correctness runs deeper: it already holds for fragments of Datalog, e.g., monadic and frontier-guarded Datalog, with a decidable containment problem, under relatively simple evaluation strategies. These simple evaluation strategies are defined w.r.t. data-moving distribution constraints. We then investigate restrictions of economic policies that yield decidability. In particular, we show that parallel-correctness is 2EXPTIME-complete for monadic and frontier-guarded Datalog under hash-based economic policies. Next, we consider restrictions of data-moving constraints and show that parallel-correctness and parallel-boundedness are 2EXPTIME-complete for frontier-guarded Datalog. Interestingly, distributed evaluation no longer preserves the usual containment relationships between fragments of Datalog. Indeed, not every monadic Datalog program is equivalent to a frontier-guarded one in the distributed setting. We illustrate the latter by considering two alternative settings where in one of these parallel-correctness is decidable for frontier-guarded Datalog but undecidable for monadic Datalog.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Datalog
  • distributed databases
  • distributed evaluation
  • decision problems
  • complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Meghyn Bienvenu, Alban Galland, and Émilien Antoine. A rule-based language for web data management. In Principles of Database Systems, pages 293-304, 2011. Google Scholar
  2. Foto N. Afrati, Manas R. Joglekar, Christopher Ré, Semih Salihoglu, and Jeffrey D. Ullman. GYM: A multiround distributed join algorithm. In International Conference on Database Theory, ICDT 2017, pages 4:1-4:18, 2017. Google Scholar
  3. Foto N. Afrati and Jeffrey D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Transactions on Knowledge and Data Engineering, 23(9):1282-1298, 2011. Google Scholar
  4. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-Correctness and Transferability for Conjunctive Queries. Journal of the ACM, 64(5):36:1-36:38, 2017. URL: http://dx.doi.org/10.1145/3106412.
  5. Apache Hadoop. URL: https://hadoop.apache.org/.
  6. Apache Spark. URL: https://spark.apache.org/.
  7. Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. Design and Implementation of the LogicBlox System. In ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pages 1371-1382, 2015. Google Scholar
  8. Vince Bárány, Balder ten Cate, and Martin Otto. Queries with Guarded Negation. Proceedings of the Very Large Database Endowment, 5(11):1328-1339, 2012. Google Scholar
  9. Vince Bárány, Balder ten Cate, and Luc Segoufin. Guarded Negation. In International Colloquium on Automata, Languages, and Programming, ICALP 2011, pages 356-367, 2011. Google Scholar
  10. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication Steps for Parallel Query Processing. Journal of the ACM, 64(6):40:1-40:58, 2017. Google Scholar
  11. Michael Benedikt, Pierre Bourhis, and Pierre Senellart. Monadic Datalog Containment. In International Colloquium on Automata, Languages, and Programming, ICALP 2012, pages 79-91, 2012. Google Scholar
  12. Michael Benedikt, Balder Ten Cate, Thomas Colcombet, and Michael Vanden Boom. The complexity of boundedness for guarded logics. In Logic in Computer Science, LICS 2015, pages 293-304, 2015. Google Scholar
  13. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. Reasonable Highly Expressive Query Languages. In International Joint Conference on Artificial Intelligence, IJCAI 2015, pages 2826-2832, 2015. Google Scholar
  14. Ashok K. Chandra and Philip M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77-90, 1977. Google Scholar
  15. Surajit Chaudhuri and Moshe Y Vardi. On the equivalence of recursive and nonrecursive datalog programs. Journal of Computer and System Sciences, 54(1):61-78, 1997. Google Scholar
  16. Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable Optimization Problems for Database Logic Programs (Preliminary Report). In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 477-490, 1988. URL: http://dx.doi.org/10.1145/62212.62259.
  17. Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. A Framework for the Parallel Processing of Datalog Queries. In ACM SIGMOD International Conference on Management of Data, SIGMOD 1990, pages 143-152, 1990. Google Scholar
  18. Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation. In International Conference on Database Theory, ICDT 2016, pages 9:1-9:17, 2016. Google Scholar
  19. Gaetano Geck, Frank Neven, and Thomas Schwentick. Distribution Constraints: a Declarative Framework for Reasoning about Data Distributions. Manuscript, 2018. Google Scholar
  20. Daniel Halperin, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Paraschos Koutris, Dominik Moritz, Jennifer Ortiz, Vaspol Ruamviboonsuk, Jingjing Wang, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Howe, and Dan Suciu. Demonstration of the Myria Big Data Management Service. In ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pages 881-884, 2014. Google Scholar
  21. Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution Policies for Datalog. In International Conference on Database Theory, ICDT 2018, pages 17:1-17:22, 2018. Google Scholar
  22. Bas Ketsman, Frank Neven, and Brecht Vandevoort. Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics. In International Conference on Database Theory, ICDT 2018, pages 18:1-18:16, 2018. Google Scholar
  23. Bas Ketsman and Dan Suciu. A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries. In Principles of Database Systems, PODS 2017, pages 417-428, 2017. Google Scholar
  24. Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In International Conference on Database Theory, ICDT 2016, pages 8:1-8:18, 2016. Google Scholar
  25. M. Tamer Özsu and Patrick Valduriez. Principles of Distributed Database Systems, Third Edition. Springer, 2011. Google Scholar
  26. Yehoshua Sagiv and Mihalis Yannakakis. Equivalences Among Relational Expressions with the Union and Difference Operators. J. ACM, 27(4):633-655, 1980. URL: http://dx.doi.org/10.1145/322217.322221.
  27. Mark V. Sapir. Minsky Machines and Algorithmic Problems. In Fields of Logic and Computation II - Essays Dedicated to Yuri Gurevich on the Occasion of His 75th Birthday, volume 9300 of Lecture Notes in Computer Science, pages 273-292. Springer, 2015. Google Scholar
  28. Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. Big Data Analytics with Datalog Queries on Spark. In ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, pages 1135-1149, 2016. Google Scholar
  29. Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines. Proceedings of the Very Large Database Endowment, 8(12):1542-1553, 2015. Google Scholar
  30. Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. Shark: SQL and Rich Analytics at Scale. In ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pages 13-24, 2013. Google Scholar
  31. Erfan Zamanian, Carsten Binnig, and Abdallah Salama. Locality-aware Partitioning in Parallel Database Systems. In ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pages 17-30, 2015. Google Scholar