Modular Recoverable Mutual Exclusion Under System-Wide Failures

Authors Sahil Dhoked , Wojciech Golab , Neeraj Mittal



PDF
Thumbnail PDF

File

LIPIcs.DISC.2023.17.pdf
  • Filesize: 1.02 MB
  • 24 pages

Document Identifiers

Author Details

Sahil Dhoked
  • Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
Wojciech Golab
  • Department of Electrical and Computer Engineering, University of Waterloo, Canada
Neeraj Mittal
  • Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA

Acknowledgements

We thank the anonymous reviewers for their helpful feedback and insightful suggestions.

Cite AsGet BibTex

Sahil Dhoked, Wojciech Golab, and Neeraj Mittal. Modular Recoverable Mutual Exclusion Under System-Wide Failures. In 37th International Symposium on Distributed Computing (DISC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 281, pp. 17:1-17:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.DISC.2023.17

Abstract

Recoverable mutual exclusion (RME) is a fault-tolerant variation of Dijkstra’s classic mutual exclusion (ME) problem that allows processes to fail by crashing as long as they recover eventually. A growing body of literature on this topic, starting with the problem formulation by Golab and Ramaraju (PODC'16), examines the cost of solving the RME problem, which is quantified by counting the expensive shared memory operations called remote memory references (RMRs), under a variety of conditions. Published results show that the RMR complexity of RME algorithms, among other factors, depends crucially on the failure model used: individual process versus system-wide. Recent work by Golab and Hendler (PODC'18) also suggests that explicit failure detection can be helpful in attaining constant RMR solutions to the RME problem in the system-wide failure model. Follow-up work by Jayanti, Jayanti, and Joshi (SPAA'23) shows that such a solution exists even without employing a failure detector, albeit this solution uses a more complex algorithmic approach. In this work, we dive deeper into the study of RMR-optimal RME algorithms for the system-wide failure model, and present contributions along multiple directions. First, we introduce the notion of withdrawing from a lock acquisition rather than resetting the lock. We use this notion to design a withdrawable RME algorithm with optimal O(1) RMR complexity for both cache-coherent (CC) and distributed shared memory (DSM) models in a modular way without using an explicit failure detector. In some sense, our technique marries the simplicity of Golab and Hendler’s algorithm with Jayanti, Jayanti and Joshi’s weaker system model. Second, we present a variation of our algorithm that supports fully dynamic process participation (i.e., both joining and leaving) in the CC model, while maintaining its constant RMR complexity. We show experimentally that our algorithm is substantially faster than Jayanti, Jayanti, and Joshi’s algorithm despite having stronger correctness properties. Finally, we establish an impossibility result for fully dynamic RME algorithms with bounded RMR complexity in the DSM model that are adaptive with respect to space, and provide a wait-free withdraw section.

Subject Classification

ACM Subject Classification
  • Theory of computation → Concurrent algorithms
Keywords
  • mutual exclusion
  • shared memory
  • persistent memory
  • fault tolerance
  • system-wide failure
  • RMR complexity
  • dynamic joining
  • dynamic leaving

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Yehuda Afek, David S. Greenberg, Michael Merritt, and Gadi Taubenfeld. Computing with faulty shared objects. Journal of the ACM (JACM), 42(6):1231-1274, 1995. Google Scholar
  2. Marcos K. Aguilera and Svend Frølund. Strict linearizability and the power of aborting. Technical Report HPL-2003-241, Hewlett-Packard Labs, 2003. Google Scholar
  3. Adam Alon and Adam Morrison. Deterministic abortable mutual exclusion with sublogarithmic adaptive RMR complexity. In Proc. of the 37th ACM Symposium on Principles of Distributed Computing (PODC), pages 27-36, 2018. Google Scholar
  4. James H. Anderson, Yong-Jik Kim, and Ted Herman. Shared-memory mutual exclusion: major research trends since 1986. Distributed Computing (DC), 16(2-3):75-110, 2003. Google Scholar
  5. Hagit Attiya, Danny Hendler, and Philipp Woelfel. Tight RMR lower bounds for mutual exclusion and other problems. In Proc. of the 40th ACM Symposium on Theory of Computing (STOC), pages 217-226, 2008. Google Scholar
  6. Ryan Berryhill, Wojciech Golab, and Mahesh Tripunitara. Robust shared objects for non-volatile main memory. In Proc. of the 19th International Conference on Principles of Distributed Systems (OPODIS), pages 20:1-20:17, 2016. Google Scholar
  7. Philip Bohannon, Daniel Lieuwen, and Avi Silberschatz. Recovering scalable spin locks. In Proc. of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP), pages 314-322, 1996. Google Scholar
  8. Philip Bohannon, Daniel Lieuwen, Avi Silberschatz, S. Sudarshan, and Jacques Gava. Recoverable user-level mutual exclusion. In Proc. of the 7th IEEE Symposium on Parallel and Distributed Processing (SPDP), pages 293-301, 1995. Google Scholar
  9. David Yu Cheng Chan, George Giakkoupis, and Philipp Woelfel. Word-size rmr trade-offs for recoverable mutual exclusion. In Proc. of the 43rd ACM Symposium on Principles of Distributed Computing (PODC), 2023. Google Scholar
  10. David Yu Cheng Chan and Philipp Woelfel. A tight lower bound for the RMR complexity of recoverable mutual exclusion. In Proc. of the 40th ACM Symposium on Principles of Distributed Computing (PODC), 2021. Google Scholar
  11. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM (JACM), 43(2):225-267, 1996. Google Scholar
  12. Travis S. Craig. Building FIFO and priority-queuing spin locks from atomic swap. Technical Report 93-02-02, Department of Computer Science, University of Washington, 1993. Google Scholar
  13. Sahil Dhoked. Synchronization and Fault Tolerance Techniques in Concurrent Shared Memory Systems. PhD thesis, The University of Texas at Dallas, 2022. Google Scholar
  14. Sahil Dhoked, Wojciech Golab, and Neeraj Mittal. Recoverable Mutual Exclusion. Springer Nature, 2023. Google Scholar
  15. Sahil Dhoked and Neeraj Mittal. An adaptive approach to recoverable mutual exclusion. In Proc. of the 39th ACM Symposium on Principles of Distributed Computing (PODC), pages 1-10, New York, NY, USA, 2020. URL: https://doi.org/10.1145/3382734.3405739.
  16. Edsger W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM (CACM), 8(9):569, 1965. Google Scholar
  17. Rotem Dvir and Gadi Taubenfeld. Mutual exclusion algorithms with constant RMR complexity and wait-free exit code. In James Aspnes, Alysson Bessani, Pascal Felber, and João Leitão, editors, Proc. of the International Conference on Principles of Distributed Systems (OPODIS), volume 95, pages 17:1-17:16, Dagstuhl, Germany, October 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  18. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32:374-382, 1985. Google Scholar
  19. George Giakkoupis and Philipp Woelfel. Randomized abortable mutual exclusion with constant amortized RMR complexity on the CC model. In Proc. of the 36th ACM Symposium on Principles of Distributed Computing (PODC), pages 221-229, 2017. Google Scholar
  20. Wojciech Golab, Vassos Hadzilacos, Danny Hendler, and Philipp Woelfel. RMR-efficient implementations of comparison primitives using read and write operations. Distributed Computing (DC), 25(2):109-162, 2012. Google Scholar
  21. Wojciech Golab and Danny Hendler. Recoverable mutual exclusion in sub-logarithmic time. In Proc. of the 36th ACM Symposium on Principles of Distributed Computing (PODC), pages 211-220, 2017. Google Scholar
  22. Wojciech Golab and Danny Hendler. Recoverable mutual exclusion under system-wide failures. In Proc. of the 37th ACM Symposium on Principles of Distributed Computing (PODC), pages 17-26, 2018. Google Scholar
  23. Wojciech Golab and Aditya Ramaraju. Recoverable mutual exclusion. In Proc. of the 35th ACM Symposium on Principles of Distributed Computing (PODC), pages 65-74, 2016. Google Scholar
  24. Wojciech Golab and Aditya Ramaraju. Recoverable mutual exclusion. Distributed Computing (DC), 32(6):535-564, 2019. Google Scholar
  25. Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming, Revised Reprint. Morgan Kaufman, 2012. Google Scholar
  26. Prasad Jayanti, Siddhartha Jayanti, and Sucharita Jayanti. Towards an ideal queue lock. In Proc. of the 21st International Conference on Distributed Computing and Networking (ICDCN), January 2020. Google Scholar
  27. Prasad Jayanti, Siddhartha Jayanti, and Anup Joshi. A recoverable mutex algorithm with sub-logarithmic RMR on both CC and DSM. In Proc. of the 38th ACM Symposium on Principles of Distributed Computing (PODC), pages 177-186, 2019. Google Scholar
  28. Prasad Jayanti, Siddhartha Jayanti, and Anup Joshi. Constant rmr recoverable mutex under system-wide crashes, 2023. URL: https://arxiv.org/abs/2302.00748.
  29. Prasad Jayanti, Siddhartha Jayanti, and Anup Joshi. Constant RMR system-wide failure resilient durable locks with dynamic joining. In Proc. of the 35th ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 227-237, 2023. Google Scholar
  30. Prasad Jayanti and Siddhartha V. Jayanti. Constant amortized RMR complexity deterministic abortable mutual exclusion algorithm for CC and DSM models. In Proc. of the 38th ACM Symposium on Principles of Distributed Computing (PODC), pages 167-176, 2019. Google Scholar
  31. Prasad Jayanti and Anup Joshi. Recoverable mutual exclusion with abortability. In Mohamed Faouzi Atig and Alexander A. Schwarzmann, editors, Proc. of the International Conference on Networked Systems (NetSys), pages 217-232, 2019. Google Scholar
  32. Prasad Jayanti and Anup Joshi. Recoverable mutual exclusion with abortability. In Proc. of 7th International Conference on Networked Systems (NETYS), pages 217-232, 2019. Google Scholar
  33. Anup Joshi. Recoverable Mutual Exclusion Algorithms for Crash-Restart Shared-Memory Systems. PhD thesis, Dartmouth College, May 2020. Google Scholar
  34. Daniel Katzan and Adam Morrison. Recoverable, abortable, and adaptive mutual exclusion with sublogarithmic RMR complexity. In Proc. of the 24th International Conference on Principles of Distributed Systems (OPODIS), pages 15:1-15:16, 2021. Google Scholar
  35. Leslie Lamport. The mutual exclusion problem: part I - a theory of interprocess communication. Journal of the ACM (JACM), 33(2):313-326, 1986. Google Scholar
  36. Leslie Lamport. The mutual exclusion problem: part II - statement and solutions. Journal of the ACM (JACM), 33(2):327-348, 1986. Google Scholar
  37. Hyonho Lee. Local-spin mutual exclusion algorithms on the DSM model using fetch-&-store objects. Master’s thesis, University of Toronto, 2003. Google Scholar
  38. John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS), 9(1):21-65, 1991. Google Scholar
  39. Maged M. Michael and Yong-Jik Kim. Fault tolerant mutual exclusion locks for shared memory systems, 2009. US Patent 7,493,618. Google Scholar
  40. Thomas Moscibroda and Rotem Oshman. Resilience of mutual exclusion algorithms to transient memory faults. In Proc. of the 30th ACM Symposium on Principles of Distributed Computing (PODC), pages 69-78, 2011. Google Scholar
  41. Michel Raynal. Algorithms for Mutual Exclusion. MIT, 1986. Google Scholar
  42. Michel Raynal and Gadi Taubenfeld. Mutual exclusion in fully anonymous shared memory systems. Information Processing Letters, 158:105938, 2020. Google Scholar
  43. I. Rhee. Optimizing a FIFO, scalable spin lock using consistent memory. In Proc. of the 17th IEEE Real-Time Systems Symposium (RTSS), pages 106-114, December 1996. Google Scholar
  44. Andy Rudoff. Re: cascade lake doesn't support clwb? [discussion post]. https://groups.google.com/g/pmem/c/DRdYIc70RHc/m/rtoP681rAAAJ, 2021. Google Groups.
  45. Andy Rudoff and the Intel PMDK Team. Persistent memory development kit, 2020. [last accessed 2/11/2021]. URL: https://pmem.io/pmdk/.
  46. Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Prentice Hall, 2006. Google Scholar
  47. Gadi Taubenfeld. Coordination without prior agreement. In Elad Michael Schiller and Alexander A. Schwarzmann, editors, Proc. of the 36th ACM Symposium on Principles of Distributed Computing (PODC), pages 325-334, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail