A Simple Parallel Algorithm for Natural Joins on Binary Relations

Author Yufei Tao



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2020.25.pdf
  • Filesize: 0.58 MB
  • 18 pages

Document Identifiers

Author Details

Yufei Tao
  • Chinese University of Hong Kong, Hong Kong

Cite AsGet BibTex

Yufei Tao. A Simple Parallel Algorithm for Natural Joins on Binary Relations. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICDT.2020.25

Abstract

In PODS'17, Ketsman and Suciu gave an algorithm in the MPC model for computing the result of any natural join where every input relation has two attributes. Achieving an optimal load O(m/p^{1/ρ}) - where m is the total size of the input relations, p the number of machines, and ρ the fractional edge covering number of the join - their algorithm requires 7 rounds to finish. This paper presents a simpler algorithm that ensures the same load with 3 rounds (in fact, the second round incurs only a load of O(p²) to transmit certain statistics to assist machine allocation in the last round). Our algorithm is made possible by a new theorem that provides fresh insight on the structure of the problem, and brings us closer to understanding the intrinsic reason why joins on binary relations can be settled with load O(m/p^{1/ρ}).

Subject Classification

ACM Subject Classification
  • Theory of computation → Database query processing and optimization (theory)
Keywords
  • Natural Joins
  • Conjunctive Queries
  • MPC Algorithms
  • Parallel Computing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
  2. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Alexander Rasin, and Avi Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Proceedings of the VLDB Endowment (PVLDB), 2(1):922-933, 2009. Google Scholar
  3. Foto N. Afrati and Jeffrey D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Transactions on Knowledge and Data Engineering (TKDE), 23(9):1282-1298, 2011. Google Scholar
  4. Albert Atserias, Martin Grohe, and Daniel Marx. Size Bounds and Query Plans for Relational Joins. SIAM J. Comput., 42(4):1737-1767, 2013. Google Scholar
  5. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication Steps for Parallel Query Processing. Journal of the ACM (JACM), 64(6):40:1-40:58, 2017. Google Scholar
  6. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 137-150, 2004. Google Scholar
  7. Xiao Hu, Paraschos Koutris, and Ke Yi. An External-Memory Work-Depth Model and Its Applications to Massively Parallel Join Algorithms. Manuscript, 2018. Google Scholar
  8. Xiaocheng Hu, Miao Qiao, and Yufei Tao. I/O-efficient join dependency testing, loomis-whitney join, and triangle enumeration. Journal of Computer and System Sciences (JCSS), 82(8):1300-1315, 2016. Google Scholar
  9. Bas Ketsman and Dan Suciu. A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 417-428, 2017. Google Scholar
  10. Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In Proceedings of International Conference on Database Theory (ICDT), pages 8:1-8:18, 2016. Google Scholar
  11. Hung Q. Ngo, Ely Porat, Christopher Re, and Atri Rudra. Worst-case Optimal Join Algorithms. Journal of the ACM (JACM), 65(3):16:1-16:40, 2018. Google Scholar
  12. Rasmus Pagh and Francesco Silvestri. The input/output complexity of triangle enumeration. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 224-233, 2014. Google Scholar
  13. Edward R. Scheinerman and Daniel H. Ullman. Fractional Graph Theory: A Rational Approach to the Theory of Graphs. Wiley, New York, 1997. Google Scholar
  14. Todd L. Veldhuizen. Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In Proceedings of International Conference on Database Theory (ICDT), pages 96-106, 2014. Google Scholar
  15. Mihalis Yannakakis. Algorithms for Acyclic Database Schemes. In Very Large Data Bases, 7th International Conference, September 9-11, 1981, Cannes, France, Proceedings, pages 82-94, 1981. Google Scholar