Brief Announcement: MapReduce Algorithms for Massive Trees

Authors MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Vahab Mirrokni



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.162.pdf
  • Filesize: 267 kB
  • 4 pages

Document Identifiers

Author Details

MohammadHossein Bateni
  • Google Research, New York
Soheil Behnezhad
  • University of Maryland
Mahsa Derakhshan
  • University of Maryland
MohammadTaghi Hajiaghayi
  • University of Maryland
Vahab Mirrokni
  • Google Research, New York

Cite AsGet BibTex

MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, and Vahab Mirrokni. Brief Announcement: MapReduce Algorithms for Massive Trees. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 162:1-162:4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.162

Abstract

Solving large-scale graph problems is a fundamental task in many real-world applications, and it is an increasingly important problem in data analysis. Despite the large effort in designing scalable graph algorithms, many classic graph problems lack algorithms that require only a sublinear number of machines and space in the input size. Specifically when the input graph is large and sparse, which is indeed the case for many real-world graphs, it becomes impossible to store and access all the vertices in one machine - something that is often taken for granted in designing algorithms for massive graphs. The theoretical model that we consider is the Massively Parallel Communications (MPC) model which is a popular theoretical model of MapReduce-like systems. In this paper, we give an algorithmic framework to adapt a large family of dynamic programs on MPC. We start by introducing two classes of dynamic programming problems, namely "(poly log)-expressible" and "linear-expressible" problems. We show that both classes can be solved efficiently using a sublinear number of machines and a sublinear memory per machine. To achieve this result, we introduce a series of techniques that can be plugged together. To illustrate the generality of our framework, we implement in O(log n) rounds of MPC, the dynamic programming solution of fundamental problems such as minimum bisection, k-spanning tree, maximum independent set, longest path, etc., when the input graph is a tree.

Subject Classification

ACM Subject Classification
  • Theory of computation → MapReduce algorithms
  • Theory of computation → Massively parallel algorithms
  • Theory of computation → Dynamic programming
Keywords
  • MapReduce
  • Trees

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure via linear measurements. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 459-467, 2012. URL: http://portal.acm.org/citation.cfm?id=2095156&CFID=63838676&CFTOKEN=79617016.
  2. Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 574-583. ACM, 2014. Google Scholar
  3. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems, pages 273-284. ACM, 2013. Google Scholar
  4. Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1326-1344, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch92.
  5. Rajesh Hemant Chitnis, Graham Cormode, Mohammad Taghi Hajiaghayi, and Morteza Monemizadeh. Parameterized streaming: Maximal matching and vertex cover. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1234-1251, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.82.
  6. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. Google Scholar
  7. Hossein Esfandiari, Mohammad Taghi Hajiaghayi, Vahid Liaghat, Morteza Monemizadeh, and Krzysztof Onak. Streaming algorithms for estimating the matching size in planar graphs and beyond. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1217-1233, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.81.
  8. Michael T Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the mapreduce framework. In International Symposium on Algorithms and Computation, pages 374-383. Springer, 2011. Google Scholar
  9. Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 938-948. Society for Industrial and Applied Mathematics, 2010. Google Scholar
  10. Tom White. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2012. Google Scholar
  11. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, pages 10-10, 2010. URL: http://dl.acm.org/citation.cfm?id=1863103.1863113.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail