Fully Read/Write Fence-Free Work-Stealing with Multiplicity

Authors Armando Castañeda, Miguel Piña

Thumbnail PDF


  • Filesize: 1.68 MB
  • 20 pages

Document Identifiers

Author Details

Armando Castañeda
  • Institute of Mathematics, National Autonomous University of Mexico, Mexico City, Mexico
Miguel Piña
  • Faculty of Sciences, National Autonomous University of Mexico, Mexico City, Mexico

Cite AsGet BibTex

Armando Castañeda and Miguel Piña. Fully Read/Write Fence-Free Work-Stealing with Multiplicity. In 35th International Symposium on Distributed Computing (DISC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 209, pp. 16:1-16:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


It is known that any algorithm for work-stealing in the standard asynchronous shared memory model must use expensive Read-After-Write synchronization patterns or atomic Read-Modify-Write instructions. There have been proposed algorithms for relaxations in the standard model and algorithms in restricted models that avoid the impossibility result, but only in some operations. This paper considers work-stealing with multiplicity, a relaxation in which every task is taken by at least one operation, with the requirement that any process can extract a task at most once. Two versions of the relaxation are considered and two fully Read/Write algorithms are presented in the standard asynchronous shared memory model, both devoid of Read-After-Write synchronization patterns in all its operations, the second algorithm additionally being fully fence-free, namely, no specific ordering among the algorithm’s instructions is required, beyond what is implied by data dependence. To our knowledge, these are the first algorithms for work-stealing possessing all these properties. Our algorithms are also wait-free solutions of relaxed versions of single-enqueue multi-dequeuer queues. The algorithms are obtained by reducing work-stealing with multiplicity and weak multiplicity to MaxRegister and RangeMaxRegister, a relaxation of MaxRegister which might be of independent interest. An experimental evaluation shows that our fully fence-free algorithm exhibits better performance than Cilk THE, Chase-Lev and Idempotent Work-Stealing algorithms.

Subject Classification

ACM Subject Classification
  • Theory of computation → Distributed computing models
  • Computing methodologies → Distributed algorithms
  • Computing methodologies → Concurrent algorithms
  • Theory of computation → Concurrent algorithms
  • Theory of computation → Distributed algorithms
  • Correctness condition
  • Linearizability
  • Nonblocking
  • Relaxed data type
  • Set-linearizability
  • Wait-freedom
  • Work-stealing


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Dolev Adas and Roy Friedman. Brief announcement: Jiffy: A fast, memory efficient, wait-free multi-producers single-consumer queue. In 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, Virtual Conference, pages 50:1-50:3, 2020. URL: https://doi.org/10.4230/LIPIcs.DISC.2020.50.
  2. Yehuda Afek, Guy Korland, and Eitan Yanovsky. Quasi-linearizability: Relaxed consistency for improved concurrency. In Principles of Distributed Systems - 14th International Conference, OPODIS 2010, Tozeur, Tunisia, December 14-17, 2010. Proceedings, pages 395-410, 2010. URL: https://doi.org/10.1007/978-3-642-17653-1_29.
  3. James Aspnes, Hagit Attiya, and Keren Censor-Hillel. Polylogarithmic concurrent data structures from monotone circuits. J. ACM, 59(1):2:1-2:24, 2012. URL: https://doi.org/10.1145/2108242.2108244.
  4. Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, and Martin T. Vechev. Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 487-498, 2011. URL: https://doi.org/10.1145/1926385.1926442.
  5. Eduard Ayguadé, Nawal Copty, Alejandro Duran, Jay Hoeflinger, Yuan Lin, Federico Massaioli, Xavier Teruel, Priya Unnikrishnan, and Guansong Zhang. The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst., 20(3):404-418, 2009. URL: https://doi.org/10.1109/TPDS.2008.105.
  6. David A. Bader and Guojing Cong. A fast, parallel spanning tree algorithm for symmetric multiprocessors. In 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), CD-ROM / Abstracts Proceedings, 26-30 April 2004, Santa Fe, New Mexico, USA. IEEE Computer Society, 2004. URL: https://doi.org/10.1109/IPDPS.2004.1302951.
  7. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), Santa Barbara, California, USA, July 19-21, 1995, pages 207-216, 1995. URL: https://doi.org/10.1145/209936.209958.
  8. Armando Castañeda and Miguel Piña. Fully read/write fence-free work-stealing with multiplicity. CoRR, abs/2008.04424, 2020. URL: http://arxiv.org/abs/2008.04424.
  9. Armando Castañeda, Sergio Rajsbaum, and Michel Raynal. Unifying concurrent objects and distributed tasks: Interval-linearizability. J. ACM, 65(6):45:1-45:42, 2018. URL: https://doi.org/10.1145/3266457.
  10. Armando Castañeda, Sergio Rajsbaum, and Michel Raynal. Relaxed queues and stacks from read/write operations. In 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, Strasbourg, France (Virtual Conference), pages 13:1-13:19, 2020. URL: https://doi.org/10.4230/LIPIcs.OPODIS.2020.13.
  11. Philippe Charles, Christian Grothoff, Vijay A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2005, October 16-20, 2005, San Diego, CA, USA, pages 519-538, 2005. URL: https://doi.org/10.1145/1094811.1094852.
  12. David Chase and Yossi Lev. Dynamic circular work-stealing deque. In SPAA 2005: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, July 18-20, 2005, Las Vegas, Nevada, USA, pages 21-28, 2005. URL: https://doi.org/10.1145/1073970.1073974.
  13. Matei David. A single-enqueuer wait-free queue implementation. In Distributed Computing, 18th International Conference, DISC 2004, Amsterdam, The Netherlands, October 4-7, 2004, Proceedings, pages 132-143, 2004. URL: https://doi.org/10.1007/978-3-540-30186-8_10.
  14. Christine H. Flood, David Detlefs, Nir Shavit, and Xiolan Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the 1st Java Virtual Machine Research and Technology Symposium, April 23-24, 2001, Monterey, CA, USA, 2001. URL: http://www.usenix.org/publications/library/proceedings/jvm01/full_papers/flood/flood.pdf.
  15. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 17-19, 1998, pages 212-223, 1998. URL: https://doi.org/10.1145/277650.277725.
  16. Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Distributed Comput., 18(3):189-207, 2006. URL: https://doi.org/10.1007/s00446-005-0144-5.
  17. Danny Hendler and Nir Shavit. Non-blocking steal-half work queues. In Proceedings of the Twenty-First Annual ACM Symposium on Principles of Distributed Computing, PODC 2002, Monterey, California, USA, July 21-24, 2002, pages 280-289, 2002. URL: https://doi.org/10.1145/571825.571876.
  18. Maurice Herlihy. Impossibility results for asynchronous PRAM (extended abstract). In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '91, Hilton Head, South Carolina, USA, July 21-24, 1991, pages 327-336, 1991. URL: https://doi.org/10.1145/113379.113409.
  19. Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1):124-149, 1991. URL: https://doi.org/10.1145/114005.102808.
  20. Maurice Herlihy and Nir Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008. Google Scholar
  21. Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, 1990. Google Scholar
  22. Prasad Jayanti and Srdjan Petrovic. Logarithmic-time single deleter, multiple inserter wait-free queues and stacks. In FSTTCS 2005: Foundations of Software Technology and Theoretical Computer Science, 25th International Conference, Hyderabad, India, December 15-18, 2005, Proceedings, pages 408-419, 2005. URL: https://doi.org/10.1007/11590156_33.
  23. Prasad Jayanti, King Tan, and Sam Toueg. Time and space lower bounds for nonblocking implementations. SIAM J. Comput., 30(2):438-456, 2000. URL: https://doi.org/10.1137/S0097539797317299.
  24. Doug Lea. A java fork/join framework. In Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, June 3-5, 2000, pages 36-43, 2000. URL: https://doi.org/10.1145/337449.337465.
  25. Maged M. Michael, Martin T. Vechev, and Vijay A. Saraswat. Idempotent work stealing. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February 14-18, 2009, pages 45-54, 2009. URL: https://doi.org/10.1145/1504176.1504186.
  26. Adam Morrison and Yehuda Afek. Fence-free work stealing on bounded TSO processors. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014, pages 413-426, 2014. URL: https://doi.org/10.1145/2541940.2541987.
  27. Gil Neiger. Set-linearizability. In Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, Los Angeles, California, USA, August 14-17, 1994, page 396, 1994. URL: https://doi.org/10.1145/197917.198176.
  28. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary R. Bradski, and Christos Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 10-14 February 2007, Phoenix, Arizona, USA, pages 13-24, 2007. URL: https://doi.org/10.1109/HPCA.2007.346181.
  29. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. x86-tso: a rigorous and usable programmer’s model for x86 multiprocessors. Commun. ACM, 53(7):89-97, 2010. URL: https://doi.org/10.1145/1785414.1785443.
  30. Chaoran Yang and John M. Mellor-Crummey. A wait-free queue as fast as fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, pages 16:1-16:13, 2016. URL: https://doi.org/10.1145/2851141.2851168.