Lock Oscillation: Boosting the Performance of Concurrent Data Structures

Fatourou, Panagiota; Kallimanis, Nikolaos D.

doi:10.4230/LIPIcs.OPODIS.2017.8

File

Subject Classification

Keywords

Synchronization
concurrent data structures
combining

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

In combining-based synchronization, two main parameters that affect performance are the com- bining degree of the synchronization algorithm, i.e. the average number of requests that each com- biner serves, and the number of expensive synchronization primitives (like CAS, Swap, etc.) that it performs. The value of the first parameter must be high, whereas the second must be kept low. In this paper, we present Osci, a new combining technique that shows remarkable perform- ance when paired with cheap context switching. We experimentally show that Osci significantly outperforms all previous combining algorithms. Specifically, the throughput of Osci is higher than that of previously presented combining techniques by more than an order of magnitude. Notably, Osci’s throughput is much closer to the ideal than all previous algorithms, while keep- ing the average latency in serving each request low. We evaluated the performance of Osci in two different multiprocessor architectures, namely AMD and Intel. Based on Osci, we implement and experimentally evaluate implementations of concurrent queues and stacks. These implementations outperform by far all current state-of-the-art concur- rent queue and stack implementations. Although the current version of Osci has been evaluated in an environment supporting user-level threads, it would run correctly on any threading library, preemptive or not (including kernel threads).

Cite As Get BibTex

Panagiota Fatourou and Nikolaos D. Kallimanis. Lock Oscillation: Boosting the Performance of Concurrent Data Structures. In 21st International Conference on Principles of Distributed Systems (OPODIS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 95, pp. 8:1-8:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.OPODIS.2017.8

Author Details

Panagiota Fatourou

Nikolaos D. Kallimanis

References

Java threads in the solaris environment - earlier releases. URL: http://docs.oracle.com/cd/E19455-01/806-3461/6jck06gqe/.
User-mode scheduling in windows operating synstem. URL: http://msdn.microsoft.com/en-us/library/windows/desktop/dd627187.
Thomas E Anderson, Brian N Bershad, Edward D Lazowska, and Henry M Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Transactions on Computer Systems (TOCS), 10(1):53-79, 1992.
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117-128, 2000.
Neil CC Brown. C++ csp2: A many-to-many threading model for multicore architectures. Communicating Process Architectures 2007: WoTUG-30, pages 183-205, 2007.
Irina Calciu, Justin E Gottschlich, and Maurice Herlihy. Using elimination and delegation to implement a scalable numa-friendly stack. In Usenix Workshop on Hot Topics in Parallelism (HotPar), 2013.
T. S. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, 1993.
Panagiota Fatourou and Nikolaos D. Kallimanis. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the 23nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 325-334, 2011.
Panagiota Fatourou and Nikolaos D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 257-266. ACM, 2012.
Matteo Frigo, Charles E Leiserson, and Keith H Randall. The implementation of the cilk-5 multithreaded language. In ACM Sigplan Notices, volume 33, pages 212-223. ACM, 1998.
J. R. Goodman, M.K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 64-75, April 1989.
Panagiotis E Hadjidoukas and Vassilios V Dimakopoulos. Nested parallelism in the ompi openmp/c compiler. In Euro-Par 2007 Parallel Processing, pages 662-671. Springer, 2007.
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. The source code for flat-combining. URL: http://github.com/mit-carbon/Flat-Combining.
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 355-364, 2010.
Maurice Herlihy and Jeannette M. Wing. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, 1990.
Brandon Holt, Jacob Nelson, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Flat combining synchronized global data structures. In 7th International Conference on PGAS Programming Models, 2013.
Randy H Katz, Susan J Eggers, David A Wood, CL Perkins, and Robert G Sheldon. Implementing a cache consistency protocol, volume 13. ACM, 1985.
David Klaftenegger, Konstantinos Sagonas, and Kjell Winblad. Queue delegation locking. 2014.
Peter S. Magnusson, Anders Landin, and Erik Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 165-171, 1994.
Simon Marlow, S Peyton Jones, et al. The glasgow haskell compiler, 2004.
John M. Mellor-Crummey and Michael L. Scott. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21-65, 1991.
Maged M. Michael and Michael L. Scott. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 267-275, 1996.
Seung-Hyun Min, Kwang-Ho Chun, Young-Rok Yang, and Myoung-Jun Kim. A soft real-time guaranteed java m: N thread mapping method. In Knowledge-Based Intelligent Information and Engineering Systems, pages 1075-1080. Springer, 2005.
Adam Morrison and Yehuda Afek. The source code for LCRQ. . URL: http://mcg.cs.tau.ac.il/projects/lcrq.
Adam Morrison and Yehuda Afek. Fast concurrent queues for x86 processors. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 103-112. ACM, 2013.
Y. Oyama, K. Taura, and A. Yonezawa. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, pages 182-204, 1999.
R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986.
Philippas Tsigas and Yi Zhang. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In SPAA, pages 134-143, 2001. URL: http://dx.doi.org/10.1145/378580.378611.
Philippas Tsigas, Yi Zhang, Daniel Cederman, and Tord Dellsen. Wait-free queue algorithms for the real-time java specification. In 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), 4-7 April 2006, San Jose, California, USA, pages 373-383. IEEE Computer Society, 2006. URL: http://dx.doi.org/10.1109/RTAS.2006.45.
Kyle B Wheeler, Richard C Murphy, and Douglas Thain. Qthreads: An api for programming with millions of lightweight threads. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-8. IEEE, 2008.
Pen-Chung Yew, Nian-Feng Tzeng, and Duncan H. Lawrie. Distributing hot-spot addressing in large-scale multiprocessors. Computers, IEEE Transactions on, 100(4):388-395, 1987.

Lock Oscillation: Boosting the Performance of Concurrent Data Structures

Authors Panagiota Fatourou, Nikolaos D. Kallimanis

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message