Better Process Mapping and Sparse Quadratic Assignment

Authors Christian Schulz, Jesper Larsson Träff



PDF
Thumbnail PDF

File

LIPIcs.SEA.2017.4.pdf
  • Filesize: 0.79 MB
  • 15 pages

Document Identifiers

Author Details

Christian Schulz
Jesper Larsson Träff

Cite As Get BibTex

Christian Schulz and Jesper Larsson Träff. Better Process Mapping and Sparse Quadratic Assignment. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.SEA.2017.4

Abstract

Communication and topology aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic assignment problem (QAP), and present algorithms to construct initial mappings of processes to processors as well as fast local search algorithms to further improve the mappings. By exploiting assumptions that typically hold for applications and modern supercomputer systems such as sparse communication patterns and hierarchically organized communication systems, we arrive at significantly more powerful algorithms for these special QAPs.  Our multilevel construction algorithms employ recently developed, perfectly balanced graph partitioning techniques and excessively exploit the given communication system hierarchy. We present improvements to a local search algorithm of Brandfass et al. (2013), and decrease the running time by reducing the time needed to perform swaps in the assignment as well as by carefully constraining local search neighborhoods. Experiments indicate that our algorithms not only dramatically speed up local search, but due to the multilevel approach also find much better solutions in practice.

Subject Classification

Keywords
  • rank reordering
  • graph algorithms
  • process mapping
  • graph partitioning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. H. Abdel-Gawad, M. Thottethodi, and A. Bhatele. RAHTM: Routing Algorithm Aware Hierarchical Task Mapping. In Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 325-335, 2014. Google Scholar
  2. D. A. Bader, H. Meyerhenke, P. Sanders, C. Schulz, A. Kappes, and D. Wagner. Benchmarking for graph clustering and partitioning. In Encyclopedia of Social Network Analysis and Mining, pages 73-82. Springer, 2014. Google Scholar
  3. M. A. Bender and M. Farach-Colton. The LCA problem revisited. In Latin American Symposium on Theoretical Informatics, volume 1776, pages 88-94. Springer, LNCS, 2000. Google Scholar
  4. C. Bichot and P. Siarry, editors. Graph Partitioning. Wiley, 2011. Google Scholar
  5. S. H. Bokhari. On the mapping problem. IEEE Trans. Computers, 30(3):207-214, 1981. URL: http://dx.doi.org/10.1109/TC.1981.1675756,
  6. S. H. Bokhari. Assignment problems in parallel and distributed computing, 2012. Google Scholar
  7. B. Brandfass, T. Alrutz, and T. Gerhold. Rank reordering for MPI communication optimization. Computers &Fluids, 80:372-380, 2013. Google Scholar
  8. A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. Recent Advances in Graph Partitioning. In Algorithm Engineering - Selected Topics, to app., ArXiv:1311.3144, 2014. Google Scholar
  9. R. E. Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis. The quadratic assignment problem. In Handbook of combinatorial optimization, pages 1713-1809. Springer, 1998. Google Scholar
  10. Ü. V. Çatalyürek and C. Aykanat. Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication. In Proc. of the 3rd Int'l Workshop on Parallel Algorithms for Irregularly Structured Problems, volume 1117, pages 75-86. Springer, 1996. Google Scholar
  11. Siew Yin Chan, Teck Chaw Ling, and Eric Aubanel. The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study. Cluster Computing, 15(3):281-302, 2012. URL: http://dx.doi.org/10.1007/s10586-012-0229-4.
  12. T. Davis. The University of Florida Sparse Matrix Collection. Google Scholar
  13. D. Delling, P. Sanders, D. Schultes, and D. Wagner. Engineering route planning algorithms. In Algorithmics of Large and Complex Networks, volume 5515 of LNCS State-of-the-Art Survey, pages 117-139. Springer, 2009. Google Scholar
  14. J. Fietz, M. Krause, C. Schulz, P. Sanders, and V. Heuveline. Optimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries. In Proc. of Euro-Par 2012 Parallel Processing, volume 7484 of LNCS, pages 818-829. Springer, 2012. Google Scholar
  15. R. Glantz, H. Meyerhenke, and A. Noe. Algorithms for mapping parallel processes onto grid and torus architectures. In 23rd Euromicro Int'l Conference on Parallel, Distributed, and Network-Based Processing, pages 236-243. IEEE Computer Society, 2015. URL: http://dx.doi.org/10.1109/PDP.2015.21.
  16. T. Hatazaki. Rank reordering strategy for MPI topology creation functions. In 5th European PVM/MPI User’s Group Meeting, volume 1497 of LNCS, pages 188-195. Springer, 1998. Google Scholar
  17. C. H. Heider. A computationally simplified pair-exchange algorithm for the quadratic assignment problem. Technical report, DTIC Document, Center for Naval Analyses Arlington VA, 1972. Google Scholar
  18. T. Hoefler and M. Snir. Generic topology mapping strategies for large-scale parallel architectures. In Proc. 25th Int'l Conf. on Supercomputing, pages 75-84. ACM, 2011. Google Scholar
  19. G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20(1):359-392, 1998. Google Scholar
  20. G. Mercier and J. Clet-Ortega. Towards an efficient process placement policy for MPI applications in multicore environments. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting (EuroMPI), pages 104-115. Springer, 2009. Google Scholar
  21. G. Mercier and Emmanuel J. Improving MPI applications performance on multicore clusters with rank reordering. In 18th European MPI Users' Group Meeting, pages 39-49, 2011. Google Scholar
  22. MPI Forum. MPI: A Message-Passing Interface Standard. Version 3.1. Google Scholar
  23. H. Müller-Merbach. Optimale Reihenfolgen, volume 15 of Ökonometrie und Unternehmensforschung. Springer, 1970. Google Scholar
  24. P. M. Pardalos and H. Wolkowicz, editors. Quadratic Assignment and Related Problems, Proceedings of a DIMACS Workshop, New Brunswick, New Jersey, USA, May 20-21, 1993, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. DIMACS/AMS, 1994. URL: http://dimacs.rutgers.edu/Volumes/Vol16.html.
  25. F. Pellegrini. Scotch Home Page. https://www.labri.fr/perso/pelegrin/scotch/.
  26. S. Sahni and T. F. Gonzalez. P-complete approximation problems. J. ACM, 23(3):555-565, 1976. URL: http://doi.acm.org/10.1145/321958.321975, URL: http://dx.doi.org/10.1145/321958.321975.
  27. P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In 12th Int'l Sym. on Experimental Algorithms (SEA'13), volume 7933 of LNCS. Springer, 2013. Google Scholar
  28. K. Schloegel, G. Karypis, and V. Kumar. Graph Partitioning for High Performance Scientific Simulations. In The Sourcebook of Parallel Computing, pages 491-541, 2003. Google Scholar
  29. C. Schulz and J. L. Träff. VieM v1.00 - Vienna Mapping and Sparse Quadratic Assignment User Guide. CoRR, abs/1703.05509, http://viem.taa.univie.ac.at/, 2017. URL: http://arxiv.org/abs/1703.05509.
  30. A. J. Soper, C. Walshaw, and M. Cross. A Combined Evolutionary Search and Multilevel Optimisation Approach to Graph-Partitioning. Global Optimization, 29(2):225-241, 2004. Google Scholar
  31. R. V. Southwell. Stress-Calculation in Frameworks by the Method of "Systematic Relaxation of Constraints". Proc. of the Royal Society of London, 151(872):56-95, 1935. Google Scholar
  32. J. L. Träff. Implementing the MPI process topology mechanism. In ACM/IEEE Supercomputing, 2002. Google Scholar
  33. J. T. Vogelstein, J. M. Conroy, V. Lyzinski, L. J. Podrazik, S. G. Kratzer, E. T. Harley, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. Fast approximate quadratic programming for graph matching, April 2015. URL: http://dx.doi.org/10.1371/journal.pone.0121002.
  34. C. Walshaw and M. Cross. Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm. SIAM Journal on Scientific Computing, 22(1):63-80, 2000. Google Scholar
  35. H. Yu, I-H. Chung, and J. E. Moreira. Topology mapping for Blue Gene/L supercomputer. In ACM/IEEE Supercomputing, page 116, 2006. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail