OASIcs.ICCSW.2014.3.pdf
- Filesize: 0.96 MB
- 8 pages
Exploiting parallelism of increasingly heterogeneous parallel architectures is challenging due to the complexity of parallelism management. To achieve high performance portability whilst preserving high productivity, high-level approaches to parallel programming delegate parallelism management, such as partitioning and work distribution, to the compiler and the run-time system. Random work stealing proved efficient for well-structured workloads, but neglects potentially useful context information that can be obtained through static analysis or monitoring at run time and used to improve load balancing, especially for irregular applications with highly varying thread granularity and thread creation patterns. We investigate the effectiveness of an adaptive work distribution scheme to improve load balancing for an extension of Haskell which provides a deterministic parallel programming model and supports both shared-memory and distributed-memory architectures. This scheme uses a less random work stealing that takes into account information on past stealing successes and failures. We quantify run time performance, communication overhead, and stealing success of four divide-and-conquer and data parallel applications for three different update intervals on a commodity 64-core Beowulf cluster of multi-cores.
Feedback for Dagstuhl Publishing