GYM: A Multiround Distributed Join Algorithm

Authors Foto N. Afrati, Manas R. Joglekar, Christopher M. Re, Semih Salihoglu, Jeffrey D. Ullman



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2017.4.pdf
  • Filesize: 0.97 MB
  • 18 pages

Document Identifiers

Author Details

Foto N. Afrati
Manas R. Joglekar
Christopher M. Re
Semih Salihoglu
Jeffrey D. Ullman

Cite AsGet BibTex

Foto N. Afrati, Manas R. Joglekar, Christopher M. Re, Semih Salihoglu, and Jeffrey D. Ullman. GYM: A Multiround Distributed Join Algorithm. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICDT.2017.4

Abstract

Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of machines with M=\Omega(IN \frac{1}{\epsilon}) memory available per machine, where \epsilon > 1 and w \ge 1 are constants, we show that: 1. Q can be computed in O(n) rounds with O(n(INw + OUT)2/M) communication cost with high probability. Q can be computed in O(log(n)) rounds with O(n(INmax(w, 3iw) + OUT)2/M) communication cost with high probability. Intersection width is a new notion we introduce for queries and generalized hypertree decompositions (GHDs) of queries that captures how connected the adjacent components of the GHDs are. We achieve our first result by introducing a distributed and generalized version of Yannakakis's algorithm, called GYM. GYM takes as input any GHD of Q with width w and depth d, and computes Q in O(d + log(n)) rounds and O(n (INw + OUT)2/M) communication cost. We achieve our second result by showing how to construct GHDs of Q with width max(w, 3iw) and depth O(log(n)). We describe another technique to construct GHDs with longer widths and lower depths, demonstrating other tradeoffs one can make between communication and the number of rounds.
Keywords
  • Joins
  • Yannakakis
  • Bulk Synchronous Processing
  • GHDs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. F. Afrati, M. Joglekar, C. Ré, Salihoglu S., and J. D. Ullman. GYM: A Multiround Join Algorithm in MapReduce and Its Analysis. CoRR, abs/1410.4156, 2014. URL: http://arxiv.org/abs/1410.4156.
  2. F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D. Ullman. Upper and Lower Bounds on the Cost of a Map-Reduce Computation. In VLDB, 2013. Google Scholar
  3. F. N. Afrati and J. D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE TKDE, 2011. Google Scholar
  4. D. Akatov. Exploiting Parallelism in Decomposition Methods for Constraint Satisfaction. PhD thesis, University of Oxford, 2010. Google Scholar
  5. Apache Hadoop. URL: http://hadoop.apache.org/.
  6. P. Beame, P. Koutris, and D. Suciu. Communication Steps for Parallel Query Processing. In PODS, 2013. Google Scholar
  7. H. L. Bodlaender. NC-Algorithms for Graphs with Small Treewidth. In Graph-Theoretic Concepts in Computer Science, 1988. Google Scholar
  8. C. Chekuri and A. Rajaraman. Conjunctive Query Containment Revisited. TCS, 2000. Google Scholar
  9. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004. Google Scholar
  10. A. Durand and E. Grandjean. The Complexity of Acyclic Conjunctive Queries Revisited. CoRR, abs/cs/0605008, 2006. URL: http://arxiv.org/abs/cs/0605008.
  11. M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, Searching, and Simulation in the Mapreduce Framework. In ISAAC, 2011. Google Scholar
  12. G. Gottlob, M. Grohe, N. Musliu, M. Samer, and F. Scarcello. Robbers, marshals, and guards: game theoretic and logical characterizations of hypertree width. In J. Comput. Syst. Sci, 2003. Google Scholar
  13. G. Gottlob, M. Grohe, N. Musliu, M. Samer, and F. Scarcello. Hypertree Decompositions: Structure, Algorithms, and Applications. In WG, 2005. Google Scholar
  14. G. Gottlob, N. Leone, and F. Scarcello. Advanced Parallel Algorithms for Acyclic Conjunctive Queries. Technical report, Vienna University of Technology, 1998. Google Scholar
  15. G. Gottlob, N. Leone, and F. Scarcello. On Tractable Queries and Constraints. In DEXA, 1999. Google Scholar
  16. G. Gottlob, N. Leone, and F. Scarcello. The Complexity of Acyclic Conjunctive Queries. J. ACM, 2001. Google Scholar
  17. G. Gottlob, N. Leone, and F. Scarcello. Hypertree Decompositions and Tractable Queries. In J. Comput. Syst. Sci., 2002. Google Scholar
  18. D. Halperin, V. Teixeira de Almeida, L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, S. Xu, M. Balazinska, B. Howe, and D. Suciu. Demonstration of the Myria Big Data Management Service. In SIGMOD, 2014. Google Scholar
  19. S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis. In New Frontiers in Information and Software as Services. Springer Berlin Heidelberg, 2011. Google Scholar
  20. S. Ibrahim, H. Jin, L. Lu, S. Wu, B. He, and L. Qi. LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud. In International Conference on Cloud Computing Technology and Science, 2010. Google Scholar
  21. H. Karloff, S. Suri, and S. Vassilvitskii. A Model of Computation for MapReduce. In SODA, 2010. Google Scholar
  22. P. Koutris, P. Beame, and D. Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In ICDT, 2016. Google Scholar
  23. P. Koutris and D. Suciu. Parallel Evaluation of Conjunctive Queries. In PODS, 2011. Google Scholar
  24. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In SIGMOD, 2008. Google Scholar
  25. A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. Space-round Tradeoffs for MapReduce Computations. In ICS, 2012. Google Scholar
  26. N. Robertson and P. D. Seymour. Graph Minors. II. Algorithmic Aspects of Tree-width. Journal of Algorithms, 1986. Google Scholar
  27. Spark SQL. URL: https://spark.apache.org/sql/.
  28. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A Warehousing Solution Over a Map-Reduce Framework. VLDB, 2009. Google Scholar
  29. L. G. Valiant. A Bridging Model for Parallel Computation. CACM, August 1990. Google Scholar
  30. M. Yannakakis. Algorithms for Acyclic Database Schemes. In VLDB, 1981. Google Scholar
  31. Zaharia, M. and Chowdhury, M. and Franklin, M. J. and Shenker, S. and Stoica, I. Spark: Cluster Computing with Working Sets. In HotCloud, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail