Distribution Policies for Datalog

Authors Bas Ketsman, Aws Albarghouthi, Paraschos Koutris



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2018.17.pdf
  • Filesize: 0.63 MB
  • 22 pages

Document Identifiers

Author Details

Bas Ketsman
Aws Albarghouthi
Paraschos Koutris

Cite AsGet BibTex

Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution Policies for Datalog. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 17:1-17:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICDT.2018.17

Abstract

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.
Keywords
  • Datalog queries
  • Distributed evaluation
  • Distribution policies

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu, editors. Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1995. Google Scholar
  2. Foto N. Afrati, Vinayak R. Borkar, Michael J. Carey, Neoklis Polyzotis, and Jeffrey D. Ullman. Map-reduce extensions and recursive queries. In Anastasia Ailamaki, Sihem Amer-Yahia, Jignesh M. Patel, Tore Risch, Pierre Senellart, and Julia Stoyanovich, editors, EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 1-8. ACM, 2011. URL: http://dx.doi.org/10.1145/1951365.1951367.
  3. Foto N. Afrati and Christos H. Papadimitriou. The parallel complexity of simple chain queries. In Moshe Y. Vardi, editor, Proceedings of the Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 23-25, 1987, San Diego, California, USA, pages 210-213. ACM, 1987. URL: http://dx.doi.org/10.1145/28659.28682.
  4. Foto N. Afrati and Jeffrey D. Ullman. Optimizing joins in a map-reduce environment. In Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Léger, Felix Naumann, Anastasia Ailamaki, and Fatma Özcan, editors, EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings, volume 426 of ACM International Conference Proceeding Series, pages 99-110. ACM, 2010. URL: http://dx.doi.org/10.1145/1739041.1739056.
  5. Foto N. Afrati and Jeffrey D. Ullman. Transitive closure and recursive datalog implemented on clusters. In Elke A. Rundensteiner, Volker Markl, Ioana Manolescu, Sihem Amer-Yahia, Felix Naumann, and Ismail Ari, editors, 15th International Conference on Extending Database Technology, EDBT '12, Berlin, Germany, March 27-30, 2012, Proceedings, pages 132-143. ACM, 2012. URL: http://dx.doi.org/10.1145/2247596.2247613.
  6. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and transferability for conjunctive queries. In Tova Milo and Diego Calvanese, editors, Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 47-58. ACM, 2015. URL: http://dx.doi.org/10.1145/2745754.2745759.
  7. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Richard Hull and Wenfei Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27, 2013, pages 273-284. ACM, 2013. URL: http://dx.doi.org/10.1145/2463664.2465224.
  8. Paul Beame, Paraschos Koutris, and Dan Suciu. Skew in parallel query processing. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, Snowbird, UT, USA, June 22-27, 2014, pages 212-223. ACM, 2014. URL: http://dx.doi.org/10.1145/2594538.2594558.
  9. Shumo Chu, Magdalena Balazinska, and Dan Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. In Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives, editors, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 63-78. ACM, 2015. URL: http://dx.doi.org/10.1145/2723372.2750545.
  10. Stavros S. Cosmadakis and Paris C. Kanellakis. Parallel evaluation of recursive rule queries. In Avi Silberschatz, editor, Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, March 24-26, 1986, Cambridge, Massachusetts, USA, pages 280-293. ACM, 1986. URL: http://dx.doi.org/10.1145/6012.15421.
  11. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI '04, pages 137-150, 2004. URL: http://www.usenix.org/events/osdi04/tech/dean.html.
  12. Hasanat M. Dewan, Salvatore J. Stolfo, Mauricio A. Hernández, and Jae-Jun Hwang. Predictive dynamic load balancing of parallel and distributed rule and query processing. In Richard T. Snodgrass and Marianne Winslett, editors, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994., pages 277-288. ACM Press, 1994. URL: http://dx.doi.org/10.1145/191839.191893.
  13. Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. A framework for the parallel processing of datalog queries. In Hector Garcia-Molina and H. V. Jagadish, editors, Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990., pages 143-152. ACM Press, 1990. URL: http://dx.doi.org/10.1145/93597.98724.
  14. Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. Parallel bottom-up processing of datalog queries. J. Log. Program., 14(1&2):101-126, 1992. URL: http://dx.doi.org/10.1016/0743-1066(92)90048-8.
  15. Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and containment for conjunctive queries with union and negation. CoRR, abs/1512.06246, 2015. URL: http://arxiv.org/abs/1512.06246.
  16. Hadoop. URL: http://hadoop.apache.org/.
  17. Daniel Halperin, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Paraschos Koutris, Dominik Moritz, Jennifer Ortiz, Vaspol Ruamviboonsuk, Jingjing Wang, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Howe, and Dan Suciu. Demonstration of the myria big data management service. In Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu, editors, International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 881-884. ACM, 2014. URL: http://dx.doi.org/10.1145/2588555.2594530.
  18. Paris C. Kanellakis. Logic programming and parallel complexity. In Giorgio Ausiello and Paolo Atzeni, editors, ICDT'86, International Conference on Database Theory, Rome, Italy, September 8-10, 1986, Proceedings, volume 243 of Lecture Notes in Computer Science, pages 1-30. Springer, 1986. URL: http://dx.doi.org/10.1007/3-540-17187-8_27.
  19. Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 417-428. ACM, 2017. URL: http://dx.doi.org/10.1145/3034786.3034788.
  20. Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-case optimal algorithms for parallel query processing. In Wim Martens and Thomas Zeume, editors, 19th International Conference on Database Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, volume 48 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.8.
  21. Paraschos Koutris and Dan Suciu. Parallel evaluation of conjunctive queries. In Maurizio Lenzerini and Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pages 223-234. ACM, 2011. URL: http://dx.doi.org/10.1145/1989284.1989310.
  22. Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. Parallel materialisation of datalog programs in centralised, main-memory RDF systems. In Carla E. Brodley and Peter Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada., pages 129-137. AAAI Press, 2014. URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8505.
  23. Jürgen Seib and Georg Lausen. Parallelizing datalog programs by generalized pivoting. In Daniel J. Rosenkrantz, editor, Proceedings of the Tenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 29-31, 1991, Denver, Colorado, USA, pages 241-251. ACM Press, 1991. URL: http://dx.doi.org/10.1145/113413.113435.
  24. Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. Distributed socialite: A datalog-based language for large-scale graph analysis. PVLDB, 6(14):1906-1917, 2013. URL: http://www.vldb.org/pvldb/vol6/p1906-seo.pdf.
  25. Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. Optimizing large-scale semi-naïve datalog evaluation in hadoop. In Pablo Barceló and Reinhard Pichler, editors, Datalog in Academia and Industry - Second International Workshop, Datalog 2.0, Vienna, Austria, September 11-13, 2012. Proceedings, volume 7494 of Lecture Notes in Computer Science, pages 165-176. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-32925-8_17.
  26. Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. Big data analytics with datalog queries on spark. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1135-1149. ACM, 2016. URL: http://dx.doi.org/10.1145/2882903.2915229.
  27. Apache spark. URL: http://spark.apache.org/.
  28. Jeffrey D. Ullman and Allen Van Gelder. Parallel complexity of logical query programs. Algorithmica, 3:5-42, 1988. URL: http://dx.doi.org/10.1007/BF01762108.
  29. Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB, 8(12):1542-1553, 2015. URL: http://www.vldb.org/pvldb/vol8/p1542-wang.pdf.
  30. Ouri Wolfson. Sharing the load of logic-program evaluation. In Sushil Jajodia, Won Kim, and Abraham Silberschatz, editors, Proceedings of the International Symposium on Databases in Parallel and Distributed Systems, Austin, Texas, USA, December 5-7, 1988, pages 46-55. IEEE Computer Society, 1988. URL: http://dx.doi.org/10.1109/DPDS.1988.675001.
  31. Ouri Wolfson and Aya Ozeri. A new paradigm for parallel and distributed rule-processing. SIGMOD Rec., 19(2):133-142, 1990. URL: http://dx.doi.org/10.1145/93605.98723.
  32. Ouri Wolfson and Avi Silberschatz. Distributed processing of logic programs. SIGMOD Rec., 17(3):329-336, 1988. URL: http://dx.doi.org/10.1145/971701.50242.
  33. Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. Shark: SQL and rich analytics at scale. In Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pages 13-24. ACM, 2013. URL: http://dx.doi.org/10.1145/2463676.2465288.
  34. Weining Zhang, Ke Wang, and Siu-Cheung Chau. Data partition and parallel evaluation of datalog programs. IEEE Trans. Knowl. Data Eng., 7(1):163-176, 1995. URL: http://dx.doi.org/10.1109/69.368511.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail