A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

Author Ruslan Nikolaev

Thumbnail PDF


  • Filesize: 0.69 MB
  • 16 pages

Document Identifiers

Author Details

Ruslan Nikolaev
  • Virginia Tech, Blacksburg, VA, USA


We thank the anonymous reviewers for their valuable feedback.

Cite AsGet BibTex

Ruslan Nikolaev. A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue. In 33rd International Symposium on Distributed Computing (DISC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 146, pp. 28:1-28:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms.

Subject Classification

ACM Subject Classification
  • Theory of computation → Concurrent algorithms
  • FIFO
  • queue
  • ring buffer
  • lock-free
  • non-blocking


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Arm Limited. ARM Architecture Reference Manual. http://developer.arm.com/, 2019.
  2. Jason Evans. A scalable concurrent malloc(3) implementation for FreeBSD. In Proceedings of the BSDCan Conference, Ottawa, Canada, 2006. Google Scholar
  3. Panagiota Fatourou and Nikolaos D. Kallimanis. Revisiting the Combining Synchronization Technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 257-266, New York, NY, USA, 2012. ACM. Google Scholar
  4. Steven Feldman and Damian Dechev. A Wait-free Multi-producer Multi-consumer Ring Buffer. ACM SIGAPP Applied Computing Review, 15(3):59-71, October 2015. Google Scholar
  5. Eric Freudenthal and Allan Gottlieb. Process Coordination with Fetch-and-increment. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pages 260-268, 1991. Google Scholar
  6. Danny Hendler, Nir Shavit, and Lena Yerushalmi. A Scalable Lock-free Stack Algorithm. In Proceedings of the 16th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '04, pages 206-215, New York, NY, USA, 2004. ACM. Google Scholar
  7. IBM. PowerPC Architecture Book, Version 2.02. Book I: PowerPC User Instruction Set Architecture. http://www.ibm.com/developerworks/, 2005.
  8. Intel. Intel 64 and IA-32 Architectures Developer’s Manual. http://www.intel.com/, 2019.
  9. Christoph M. Kirsch, Michael Lippautz, and Hannes Payer. Fast and Scalable, Lock-Free k-FIFO Queues. In Proceedings of the 12th International Conference on Parallel Computing Technologies - Volume 7979, pages 208-223, Berlin, Heidelberg, 2013. Springer-Verlag. Google Scholar
  10. Alexander Krizhanovsky. Lock-free Multi-producer Multi-consumer Queue on Ring Buffer. Linux J., 2013(228), 2013. URL: http://dl.acm.org/citation.cfm?id=2492102.2492106.
  11. Leslie Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, 28(9):690-691, September 1979. Google Scholar
  12. Liblfds. Lock-free Data Structure Library. URL: http://www.liblfds.org.
  13. Liblfds. Ringbuffer Disappointment. URL: http://www.liblfds.org/wordpress/index.php/2016/04/29/ringbuffer-disappointment/.
  14. Lockless Inc. Memory Allocator Benchmarks. https://locklessinc.com/benchmarks_allocator.shtml, 2019.
  15. Maged M. Michael. Hazard pointers: safe memory reclamation for lock-free objects. IEEE Transactions on Parallel and Distributed Systems, 15(6):491-504, June 2004. Google Scholar
  16. Maged M. Michael and Michael L. Scott. Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors. Journal of Parallel and Distributed Computing, 51(1):1-26, May 1998. Google Scholar
  17. MIPS. MIPS32/MIPS64 Rev. 6.06. http://www.mips.com/products/architectures/, 2019.
  18. Mark Moir, Daniel Nussbaum, Ori Shalev, and Nir Shavit. Using Elimination to Implement Scalable and Lock-free FIFO Queues. In Proceedings of the 17th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '05, pages 253-262, 2005. Google Scholar
  19. Adam Morrison and Yehuda Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 103-112, New York, NY, USA, 2013. ACM. Google Scholar
  20. Oracle. SPARC Architecture 2011. http://www.oracle.com/, 2019.
  21. RISC-V Foundation. RISC-V Books. http://riscv.org/risc-v-books/, 2019.
  22. Philippas Tsigas and Yi Zhang. A Simple, Fast and Scalable Non-blocking Concurrent FIFO Queue for Shared Memory Multiprocessor Systems. In Proceedings of the 13th ACM Symposium on Parallel Algorithms and Architectures, SPAA '01, pages 134-143, 2001. Google Scholar
  23. Dmitry Vyukov. Bounded MPMC queue. URL: http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue.
  24. Chaoran Yang and John Mellor-Crummey. A Wait-free Queue As Fast As Fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '16, pages 16:1-16:13, New York, NY, USA, 2016. ACM. Google Scholar