Lock Oscillation: Boosting the Performance of Concurrent Data Structures

Authors Panagiota Fatourou, Nikolaos D. Kallimanis

Thumbnail PDF


  • Filesize: 1.58 MB
  • 17 pages

Document Identifiers

Author Details

Panagiota Fatourou
Nikolaos D. Kallimanis

Cite AsGet BibTex

Panagiota Fatourou and Nikolaos D. Kallimanis. Lock Oscillation: Boosting the Performance of Concurrent Data Structures. In 21st International Conference on Principles of Distributed Systems (OPODIS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 95, pp. 8:1-8:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


In combining-based synchronization, two main parameters that affect performance are the com- bining degree of the synchronization algorithm, i.e. the average number of requests that each com- biner serves, and the number of expensive synchronization primitives (like CAS, Swap, etc.) that it performs. The value of the first parameter must be high, whereas the second must be kept low. In this paper, we present Osci, a new combining technique that shows remarkable perform- ance when paired with cheap context switching. We experimentally show that Osci significantly outperforms all previous combining algorithms. Specifically, the throughput of Osci is higher than that of previously presented combining techniques by more than an order of magnitude. Notably, Osci’s throughput is much closer to the ideal than all previous algorithms, while keep- ing the average latency in serving each request low. We evaluated the performance of Osci in two different multiprocessor architectures, namely AMD and Intel. Based on Osci, we implement and experimentally evaluate implementations of concurrent queues and stacks. These implementations outperform by far all current state-of-the-art concur- rent queue and stack implementations. Although the current version of Osci has been evaluated in an environment supporting user-level threads, it would run correctly on any threading library, preemptive or not (including kernel threads).
  • Synchronization
  • concurrent data structures
  • combining


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Java threads in the solaris environment - earlier releases. URL: http://docs.oracle.com/cd/E19455-01/806-3461/6jck06gqe/.
  2. User-mode scheduling in windows operating synstem. URL: http://msdn.microsoft.com/en-us/library/windows/desktop/dd627187.
  3. Thomas E Anderson, Brian N Bershad, Edward D Lazowska, and Henry M Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Transactions on Computer Systems (TOCS), 10(1):53-79, 1992. Google Scholar
  4. Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117-128, 2000. Google Scholar
  5. Neil CC Brown. C++ csp2: A many-to-many threading model for multicore architectures. Communicating Process Architectures 2007: WoTUG-30, pages 183-205, 2007. Google Scholar
  6. Irina Calciu, Justin E Gottschlich, and Maurice Herlihy. Using elimination and delegation to implement a scalable numa-friendly stack. In Usenix Workshop on Hot Topics in Parallelism (HotPar), 2013. Google Scholar
  7. T. S. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, 1993. Google Scholar
  8. Panagiota Fatourou and Nikolaos D. Kallimanis. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the 23nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 325-334, 2011. Google Scholar
  9. Panagiota Fatourou and Nikolaos D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 257-266. ACM, 2012. Google Scholar
  10. Matteo Frigo, Charles E Leiserson, and Keith H Randall. The implementation of the cilk-5 multithreaded language. In ACM Sigplan Notices, volume 33, pages 212-223. ACM, 1998. Google Scholar
  11. J. R. Goodman, M.K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 64-75, April 1989. Google Scholar
  12. Panagiotis E Hadjidoukas and Vassilios V Dimakopoulos. Nested parallelism in the ompi openmp/c compiler. In Euro-Par 2007 Parallel Processing, pages 662-671. Springer, 2007. Google Scholar
  13. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. The source code for flat-combining. URL: http://github.com/mit-carbon/Flat-Combining.
  14. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 355-364, 2010. Google Scholar
  15. Maurice Herlihy and Jeannette M. Wing. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, 1990. Google Scholar
  16. Brandon Holt, Jacob Nelson, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Flat combining synchronized global data structures. In 7th International Conference on PGAS Programming Models, 2013. Google Scholar
  17. Randy H Katz, Susan J Eggers, David A Wood, CL Perkins, and Robert G Sheldon. Implementing a cache consistency protocol, volume 13. ACM, 1985. Google Scholar
  18. David Klaftenegger, Konstantinos Sagonas, and Kjell Winblad. Queue delegation locking. 2014. Google Scholar
  19. Peter S. Magnusson, Anders Landin, and Erik Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 165-171, 1994. Google Scholar
  20. Simon Marlow, S Peyton Jones, et al. The glasgow haskell compiler, 2004. Google Scholar
  21. John M. Mellor-Crummey and Michael L. Scott. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21-65, 1991. Google Scholar
  22. Maged M. Michael and Michael L. Scott. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 267-275, 1996. Google Scholar
  23. Seung-Hyun Min, Kwang-Ho Chun, Young-Rok Yang, and Myoung-Jun Kim. A soft real-time guaranteed java m: N thread mapping method. In Knowledge-Based Intelligent Information and Engineering Systems, pages 1075-1080. Springer, 2005. Google Scholar
  24. Adam Morrison and Yehuda Afek. The source code for LCRQ. . URL: http://mcg.cs.tau.ac.il/projects/lcrq.
  25. Adam Morrison and Yehuda Afek. Fast concurrent queues for x86 processors. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 103-112. ACM, 2013. Google Scholar
  26. Y. Oyama, K. Taura, and A. Yonezawa. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, pages 182-204, 1999. Google Scholar
  27. R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986. Google Scholar
  28. Philippas Tsigas and Yi Zhang. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In SPAA, pages 134-143, 2001. URL: http://dx.doi.org/10.1145/378580.378611.
  29. Philippas Tsigas, Yi Zhang, Daniel Cederman, and Tord Dellsen. Wait-free queue algorithms for the real-time java specification. In 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), 4-7 April 2006, San Jose, California, USA, pages 373-383. IEEE Computer Society, 2006. URL: http://dx.doi.org/10.1109/RTAS.2006.45.
  30. Kyle B Wheeler, Richard C Murphy, and Douglas Thain. Qthreads: An api for programming with millions of lightweight threads. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-8. IEEE, 2008. Google Scholar
  31. Pen-Chung Yew, Nian-Feng Tzeng, and Duncan H. Lawrie. Distributing hot-spot addressing in large-scale multiprocessors. Computers, IEEE Transactions on, 100(4):388-395, 1987. Google Scholar