Parallel Combining: Benefits of Explicit Synchronization

Authors Vitaly Aksenov, Petr Kuznetsov, Anatoly Shalyto

Thumbnail PDF


  • Filesize: 0.59 MB
  • 16 pages

Document Identifiers

Author Details

Vitaly Aksenov
  • ITMO University, Saint-Petersburg, Russia and Inria, Paris, France
Petr Kuznetsov
  • LTCI, Télécom ParisTech, Université Paris-Saclay, Paris, France
Anatoly Shalyto
  • ITMO University, Saint-Petersburg, Russia

Cite AsGet BibTex

Vitaly Aksenov, Petr Kuznetsov, and Anatoly Shalyto. Parallel Combining: Benefits of Explicit Synchronization. In 22nd International Conference on Principles of Distributed Systems (OPODIS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 125, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


A parallel batched data structure is designed to process synchronized batches of operations on the data structure using a parallel program. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel batched one. The idea is that we explicitly synchronize concurrent operations into batches: one of the processes becomes a combiner which collects concurrent requests and initiates a parallel batched algorithm involving the owners (clients) of the collected requests. Intuitively, the cost of synchronizing the concurrent calls can be compensated by running the parallel batched algorithm. We validate the intuition via two applications. First, we use parallel combining to design a concurrent data structure optimized for read-dominated workloads, taking a dynamic graph data structure as an example. Second, we use a novel parallel batched priority queue to build a concurrent one. In both cases, we obtain performance gains with respect to the state-of-the-art algorithms.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Concurrent computing methodologies
  • Computing methodologies → Parallel computing methodologies
  • Theory of computation → Distributed computing models
  • Theory of computation → Parallel computing models
  • concurrent data structure
  • parallel batched data structure
  • combining


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Umut A. Acar, Vitaly Aksenov, and Sam Westrick. Brief Announcement: Parallel Dynamic Tree Contraction via Self-Adjusting Computation. In SPAA, pages 275-277. ACM, 2017. Google Scholar
  2. Kunal Agrawal, Jeremy T Fineman, Kefu Lu, Brendan Sheridan, Jim Sukha, and Robert Utterback. Provably good scheduling for parallel programs that use data structures through implicit batching. In SPAA, pages 84-95. ACM, 2014. Google Scholar
  3. Vitaly Aksenov, Petr Kuznetsov, and Anatoly Shalyto. Parallel Combining: Benefits of Explicit Synchronization. CoRR, abs/1710.07588, 2018. URL:
  4. Guy E Blelloch, Daniel Ferizovic, and Yihan Sun. Just join for parallel ordered sets. In SPAA, pages 253-264. ACM, 2016. Google Scholar
  5. Robert D Blumofe and Charles E Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720-748, 1999. Google Scholar
  6. OpenMP Architecture Review Board. OpenMP Application Interface., 2008.
  7. Luc Bougé, Joaquim Gabarro, Xavier Messeguer, Nicolas Schabanel, et al. Height-relaxed AVL rebalancing: A unified, fine-grained approach to concurrent dictionaries. Technical report, ENS Lyon, 1998. Google Scholar
  8. Anastasia Braginsky, Nachshon Cohen, and Erez Petrank. CBPQ: High performance lock-free priority queue. In Euro-Par, pages 460-474. Springer, 2016. Google Scholar
  9. Gerth Stølting Brodal, Jesper Larsson Träff, and Christos D Zaroliagis. A parallel priority queue with constant time operations. Journal of Parallel and Distributed Computing, 49(1):4-21, 1998. Google Scholar
  10. Armando Castañeda, Sergio Rajsbaum, and Michel Raynal. Specifying Concurrent Problems: Beyond Linearizability and up to Tasks. In DISC, pages 420-435, 2015. Google Scholar
  11. Thomas H Cormen, Charles Eric Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. The MIT press, 3rd edition, 2009. Google Scholar
  12. Travis Craig. Building FIFO and priorityqueuing spin locks from atomic swap. Technical report, Technical Report TR 93-02-02, University of Washington, 02 1993., 1993. Google Scholar
  13. Narsingh Deo and Sushil Prasad. Parallel heap: An optimal parallel priority queue. The Journal of Supercomputing, 6(1):87-98, 1992. Google Scholar
  14. Dana Drachsler-Cohen and Erez Petrank. LCD: Local Combining on Demand. In OPODIS, pages 355-371. Springer, 2014. Google Scholar
  15. Panagiota Fatourou and Nikolaos D Kallimanis. A highly-efficient wait-free universal construction. In SPAA, pages 325-334. ACM, 2011. Google Scholar
  16. Panagiota Fatourou and Nikolaos D Kallimanis. Revisiting the combining synchronization technique. In ACM SIGPLAN Notices, volume 47, pages 257-266. ACM, 2012. Google Scholar
  17. Matteo Frigo, Charles E Leiserson, and Keith H Randall. The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices, 33(5):212-223, 1998. Google Scholar
  18. Phillip B Gibbons. A more practical PRAM model. In SPAA, pages 158-168. ACM, 1989. Google Scholar
  19. Gaston H Gonnet and J Ian Munro. Heaps on heaps. SIAM Journal on Computing, 15(4):964-971, 1986. Google Scholar
  20. Rajiv Gupta and Charles R Hill. A scalable implementation of barrier synchronization using an adaptive combining tree. International Journal of Parallel Programming, 18(3):161-180, 1989. Google Scholar
  21. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA, pages 355-364, 2010. Google Scholar
  22. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Scalable Flat-Combining Based Synchronous Queues. In DISC, pages 79-93. Springer, 2010. Google Scholar
  23. Maurice Herlihy and Nir Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2012. Google Scholar
  24. Maurice Herlihy and Jeannette M. Wing. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst., 12(3):463-492, 1990. Google Scholar
  25. Jacob Holm, Kristian De Lichtenberg, and Mikkel Thorup. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. Journal of the ACM (JACM), 48(4):723-760, 2001. Google Scholar
  26. Brandon Holt, Jacob Nelson, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Flat combining synchronized global data structures. In 7th International Conference on PGAS Programming Models, page 76, 2013. Google Scholar
  27. Joseph JáJá. An introduction to parallel algorithms, volume 17. Addison-Wesley Reading, 1992. Google Scholar
  28. Jonatan Lindén and Bengt Jonsson. A skiplist-based concurrent priority queue with minimal memory contention. In International Conference On Principles Of Distributed Systems, pages 206-220. Springer, 2013. Google Scholar
  29. John M Mellor-Crummey and Michael L Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS), 9(1):21-65, 1991. Google Scholar
  30. Gil Neiger. Set-Linearizability. In PODC, page 396, 1994. Google Scholar
  31. Yoshihiro Oyama, Kenjiro Taura, and Akinori Yonezawa. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, volume 16, 1999. Google Scholar
  32. Maria Cristina Pinotti and Geppino Pucci. Parallel priority queues. Information Processing Letters, 40(1):33-40, 1991. Google Scholar
  33. Peter Sanders. Randomized priority queues for fast parallel access. Journal of Parallel and Distributed Computing, 49(1):86-97, 1998. Google Scholar
  34. Nir Shavit and Itay Lotan. Skiplist-based concurrent priority queues. In IPDPS, pages 263-268. IEEE, 2000. Google Scholar
  35. Nir Shavit and Asaph Zemach. Diffracting trees. ACM Transactions on Computer Systems (TOCS), 14(4):385-428, 1996. Google Scholar
  36. Nir Shavit and Asaph Zemach. Combining funnels: a dynamic approach to software combining. Journal of Parallel and Distributed Computing, 60(11):1355-1387, 2000. Google Scholar
  37. Leslie G Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103-111, 1990. Google Scholar
  38. Pen-Chung Y, Nian-Feng T, et al. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Transactions on Computers, 100(4):388-395, 1987. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail