Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics

Authors Bas Ketsman, Frank Neven, Brecht Vandevoort



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2018.18.pdf
  • Filesize: 0.59 MB
  • 16 pages

Document Identifiers

Author Details

Bas Ketsman
Frank Neven
Brecht Vandevoort

Cite AsGet BibTex

Bas Ketsman, Frank Neven, and Brecht Vandevoort. Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICDT.2018.18

Abstract

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds.
Keywords
  • Conjunctive queries
  • distributed evaluation
  • bag semantics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Foto N. Afrati, Manas R. Joglekar, Christopher Ré, Semih Salihoglu, and Jeffrey D. Ullman. GYM: A multiround distributed join algorithm. In ICDT, pages 4:1-4:18, 2017. Google Scholar
  2. Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, and Jeffrey D. Ullman. Upper and lower bounds on the cost of a map-reduce computation. PVLDB, 6(4):277-288, 2013. Google Scholar
  3. Foto N. Afrati and Jeffrey D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99-110, 2010. Google Scholar
  4. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Data partitioning for single-round multi-join evaluation in massively parallel systems. SIGMOD Record, 45(1):33-40, 2016. Google Scholar
  5. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and transferability for conjunctive queries. J. ACM, 64(5):36:1-36:38, 2017. Google Scholar
  6. Paul Beame, Paraschos Koutris, and Dan Suciu. Skew in parallel query processing. In PODS, pages 212-223. ACM, 2014. Google Scholar
  7. Surajit Chaudhuri and Moshe Y. Vardi. Optimization of Real conjunctive queries. In PODS, pages 59-70. ACM Press, 1993. Google Scholar
  8. Sara Cohen. Equivalence of queries that are sensitive to multiplicities. VLDB J., 18(3):765-785, 2009. Google Scholar
  9. Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. A framework for the parallel processing of datalog queries. In SIGMOD, pages 143-152. ACM Press, 1990. Google Scholar
  10. Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and containment for conjunctive queries with union and negation. In ICDT, pages 9:1-9:17, 2016. Google Scholar
  11. Hadoop. URL: https://hadoop.apache.org/.
  12. Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution policies for datalog. In ICDT, pages 17:1-17:22, 2018. Google Scholar
  13. Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In PODS, pages 417-428, 2017. Google Scholar
  14. Paraschos Koutris and Dan Suciu. Parallel evaluation of conjunctive queries. In PODS, pages 223-234, 2011. Google Scholar
  15. Rimma Nehme and Nicolas Bruno. Automated partitioning design in parallel database systems. In SIGMOD, pages 1137-1148. ACM, 2011. Google Scholar
  16. Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558-569, 2002. Google Scholar
  17. Ron van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In PODS, pages 331-345, 1992. Google Scholar
  18. Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. Shark: SQL and rich analytics at scale. In SIGMOD, pages 13-24, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail