The Weakest Failure Detector for Genuine Atomic Multicast

Author Pierre Sutra



PDF
Thumbnail PDF

File

LIPIcs.DISC.2022.35.pdf
  • Filesize: 0.85 MB
  • 19 pages

Document Identifiers

Author Details

Pierre Sutra
  • Telecom SudParis, Palaiseau, France

Cite AsGet BibTex

Pierre Sutra. The Weakest Failure Detector for Genuine Atomic Multicast. In 36th International Symposium on Distributed Computing (DISC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 246, pp. 35:1-35:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.DISC.2022.35

Abstract

Atomic broadcast is a group communication primitive to order messages across a set of distributed processes. Atomic multicast is its natural generalization where each message m is addressed to dst(m), a subset of the processes called its destination group. A solution to atomic multicast is genuine when a process takes steps only if a message is addressed to it. Genuine solutions are the ones used in practice because they have better performance. Let 𝒢 be all the destination groups and ℱ be the cyclic families in it, that is the subsets of 𝒢 whose intersection graph is hamiltonian. This paper establishes that the weakest failure detector to solve genuine atomic multicast is 𝜇 = (∧_{g,h ∈ 𝒢} Σ_{g ∩ h}) ∧ (∧_{g ∈ 𝒢} Ω_g) ∧ γ, where Σ_P and Ω_P are the quorum and leader failure detectors restricted to the processes in P, and γ is a new failure detector that informs the processes in a cyclic family f ∈ ℱ when f is faulty. We also study two classical variations of atomic multicast. The first variation requires that message delivery follows the real-time order. In this case, 𝜇 must be strengthened with 1^{g ∩ h}, the indicator failure detector that informs each process in g ∪ h when g ∩ h is faulty. The second variation requires a message to be delivered when the destination group runs in isolation. We prove that its weakest failure detector is at least 𝜇 ∧ (∧_{g, h ∈ 𝒢} Ω_{g ∩ h}). This value is attained when ℱ = ∅.

Subject Classification

ACM Subject Classification
  • Theory of computation → Distributed computing models
  • Software and its engineering → Distributed systems organizing principles
  • General and reference → Reliability
Keywords
  • Failure Detector
  • State Machine Replication
  • Consensus

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tarek Ahmed-Nacer, Pierre Sutra, and Denis Conan. The convoy effect in atomic multicast. In 35th IEEE Symposium on Reliable Distributed Systems Workshops, SRDS 2016 Workshop, Budapest, Hungary, September 26, 2016, pages 67-72. IEEE Computer Society, 2016. URL: https://doi.org/10.1109/SRDSW.2016.22.
  2. Hagit Attiya, Rachid Guerraoui, and Petr Kouznetsov. Computing with reads and writes in the absence of step contention. In Pierre Fraigniaud, editor, Distributed Computing, 19th International Conference, DISC 2005, Cracow, Poland, September 26-29, 2005, Proceedings, volume 3724 of Lecture Notes in Computer Science, pages 122-136. Springer, 2005. URL: https://doi.org/10.1007/11561927_11.
  3. Carlos Eduardo Benevides Bezerra, Fernando Pedone, and Robbert van Renesse. Scalable state-machine replication. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014, pages 331-342. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/DSN.2014.41.
  4. Kenneth Birman, André Schiper, and Pat Stephenson. Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst., 9(3):272-314, August 1991. URL: https://doi.org/10.1145/128738.128742.
  5. Kenneth P. Birman and Thomas A. Joseph. Reliable Communication in the Presence of Failures. ACM Transactions on Computers Systems, 5(1):47-76, January 1987. URL: https://doi.org/10.1145/7351.7478.
  6. François Bonnet and Michel Raynal. Looking for the weakest failure detector for k-set agreement in message-passing systems: Is π_k the end of the road? In Stabilization, Safety, and Security of Distributed Systems, 11th International Symposium, SSS 2009, Lyon, France, November 3-6, 2009. Proceedings, pages 149-164, 2009. URL: https://doi.org/10.1007/978-3-642-05118-0_11.
  7. Tushar Deepak Chandra, Robert Griesemer, and Joshua Redstone. Paxos made live: an engineering perspective. In Indranil Gupta and Roger Wattenhofer, editors, Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC 2007, Portland, Oregon, USA, August 12-15, 2007, pages 398-407. ACM, 2007. URL: https://doi.org/10.1145/1281100.1281103.
  8. Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. J. ACM, 43(4):685-722, July 1996. URL: https://doi.org/10.1145/234533.234549.
  9. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225-267, 1996. URL: https://doi.org/10.1145/226643.226647.
  10. Paulo R. Coelho, Nicolas Schiper, and Fernando Pedone. Fast atomic multicast. In 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2017, Denver, CO, USA, June 26-29, 2017, pages 37-48. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/DSN.2017.15.
  11. James A. Cowling and Barbara Liskov. Granola: Low-overhead distributed transaction coordination. In Gernot Heiser and Wilson C. Hsieh, editors, 2012 USENIX Annual Technical Conference, Boston, MA, USA, June 13-15, 2012, pages 223-235. USENIX Association, 2012. URL: https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowling.
  12. Xavier Défago, André Schiper, and Péter Urbán. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv., 36(4):372-421, December 2004. URL: https://doi.org/10.1145/1041680.1041682.
  13. Carole Delporte-Gallet and Hugues Fauconnier. Fault-tolerant genuine atomic multicast to multiple groups. In Franck Butelle, editor, Procedings of the 4th International Conference on Principles of Distributed Systems, OPODIS 2000, Paris, France, December 20-22, 2000, Studia Informatica Universalis, pages 107-122. Suger, Saint-Denis, rue Catulienne, France, 2000. Google Scholar
  14. Carole Delporte-Gallet, Hugues Fauconnier, and Rachid Guerraoui. A realistic look at failure detectors. In 2002 International Conference on Dependable Systems and Networks (DSN 2002), 23-26 June 2002, Bethesda, MD, USA, Proceedings, pages 345-353. IEEE Computer Society, 2002. URL: https://doi.org/10.1109/DSN.2002.1028919.
  15. Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui, Vassos Hadzilacos, Petr Kouznetsov, and Sam Toueg. The weakest failure detectors to solve certain fundamental problems in distributed computing. In Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, St. John’s, Newfoundland, Canada, July 25-28, 2004, pages 338-346, 2004. URL: https://doi.org/10.1145/1011767.1011818.
  16. Swan Dubois, Rachid Guerraoui, Petr Kuznetsov, Franck Petit, and Pierre Sens. The weakest failure detector for eventual consistency. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC '15, pages 375-384, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2767386.2767404.
  17. Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. Efficient replication via timestamp stability. In Antonio Barbalace, Pramod Bhatotia, Lorenzo Alvisi, and Cristian Cadar, editors, EuroSys '21: Sixteenth European Conference on Computer Systems, Online Event, United Kingdom, April 26-28, 2021, pages 178-193. ACM, 2021. URL: https://doi.org/10.1145/3447786.3456236.
  18. Felix C. Freiling, Rachid Guerraoui, and Petr Kuznetsov. The failure detector abstraction. ACM Comput. Surv., 43(2), February 2011. URL: https://doi.org/10.1145/1883612.1883616.
  19. Eli Gafni. Round-by-round fault detectors (extended abstract): Unifying synchrony and asynchrony. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC '98, pages 143-152, New York, NY, USA, 1998. ACM. URL: https://doi.org/10.1145/277697.277724.
  20. Alexey Gotsman, Anatole Lefort, and Gregory V. Chockler. White-box atomic multicast. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019, Portland, OR, USA, June 24-27, 2019, pages 176-187. IEEE, 2019. URL: https://doi.org/10.1109/DSN.2019.00030.
  21. R. Guerraoui and A. Schiper. Total order multicast to multiple groups. In Proceedings of 17th International Conference on Distributed Computing Systems, pages 578-585, 1997. URL: https://doi.org/10.1109/ICDCS.1997.603426.
  22. Rachid Guerraoui, Vassos Hadzilacos, Petr Kuznetsov, and Sam Toueg. The weakest failure detectors to solve quittable consensus and nonblocking atomic commit. SIAM J. Comput., 41(6):1343-1379, 2012. URL: https://doi.org/10.1137/070698877.
  23. Rachid Guerraoui, Maurice Herlihy, Petr Kouznetsov, Nancy Lynch, and Calvin Newport. On the weakest failure detector ever. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 235-243, New York, NY, USA, 2007. ACM. URL: https://doi.org/10.1145/1281100.1281135.
  24. Rachid Guerraoui and Michel Raynal. The alpha of indulgent consensus. Comput. J., 50(1):53-67, 2007. URL: https://doi.org/10.1093/comjnl/bxl046.
  25. Rachid Guerraoui and André Schiper. Genuine atomic multicast in asynchronous distributed systems. Theor. Comput. Sci., 254(1-2):297-316, 2001. URL: https://doi.org/10.1016/S0304-3975(99)00161-9.
  26. Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical report, Cornell University, 1994. Google Scholar
  27. Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 11(1):124-149, January 1991. URL: https://doi.org/10.1145/114005.102808.
  28. Maurice Herlihy and Jeannette Wing. Linearizability: a correcteness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, July 1990. URL: https://doi.org/10.1145/78969.78972.
  29. Udo Fritzke Jr., Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. Consensus-based fault-tolerant total order multicast. IEEE Trans. Parallel Distributed Syst., 12(2):147-156, 2001. URL: https://doi.org/10.1109/71.910870.
  30. Long Hoang Le, Mojtaba Eslahi-Kelorazi, Paulo R. Coelho, and Fernando Pedone. Ramcast: Rdma-based atomic multicast. In Kaiwen Zhang, Abdelouahed Gherbi, Nalini Venkatasubramanian, and Luís Veiga, editors, Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6 - 10, 2021, pages 172-184. ACM, 2021. URL: https://doi.org/10.1145/3464298.3493393.
  31. Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. Multi-ring paxos. In Robert S. Swarz, Philip Koopman, and Michel Cukier, editors, IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, Boston, MA, USA, June 25-28, 2012, pages 1-12. IEEE Computer Society, 2012. URL: https://doi.org/10.1109/DSN.2012.6263916.
  32. Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li. Consolidating concurrency control and consensus for commits under conflicts. In Kimberly Keeton and Timothy Roscoe, editors, 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, pages 517-532. USENIX Association, 2016. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/mu.
  33. Luís E. T. Rodrigues, Rachid Guerraoui, and André Schiper. Scalable atomic multicast. In Proceedings of the International Conference On Computer Communications and Networks (ICCCN 1998), October 12-15, 1998, Lafayette, Louisiana, USA, pages 840-847. IEEE Computer Society, 1998. URL: https://doi.org/10.1109/ICCCN.1998.998851.
  34. Nicolas Schiper and Fernando Pedone. Solving atomic multicast when groups crash. In Theodore P. Baker, Alain Bui, and Sébastien Tixeuil, editors, Principles of Distributed Systems, 12th International Conference, OPODIS 2008, Luxor, Egypt, December 15-18, 2008. Proceedings, volume 5401 of Lecture Notes in Computer Science, pages 481-495. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-92221-6_30.
  35. Nicolas Schiper, Pierre Sutra, and Fernando Pedone. Genuine versus non-genuine atomic multicast protocols for wide area networks: An empirical study. In 28th IEEE Symposium on Reliable Distributed Systems (SRDS 2009), Niagara Falls, New York, USA, September 27-30, 2009, pages 166-175. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/SRDS.2009.12.
  36. Nicolas Schiper, Pierre Sutra, and Fernando Pedone. P-store: Genuine partial replication in wide area networks. In 29th IEEE Symposium on Reliable Distributed Systems (SRDS 2010), New Delhi, Punjab, India, October 31 - November 3, 2010, pages 214-224. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/SRDS.2010.32.
  37. Pierre Sutra. The weakest failure detector for genuine atomic multicast (extended version), 2022. URL: https://doi.org/10.48550/ARXIV.2208.07650.
  38. Gadi Taubenfeld. Contention-sensitive data structures and algorithms. Theoretical Computer Science, 677:41-55, 2017. URL: https://doi.org/10.1016/j.tcs.2017.03.017.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail