Leaderless State-Machine Replication: Specification, Properties, Limits

Authors Tuanir França Rezende, Pierre Sutra



PDF
Thumbnail PDF

File

LIPIcs.DISC.2020.24.pdf
  • Filesize: 0.62 MB
  • 17 pages

Document Identifiers

Author Details

Tuanir França Rezende
  • Telecom SudParis, Évry, France
Pierre Sutra
  • Telecom SudParis, Évry, France

Acknowledgements

The authors thank Vitor Enes and Alexey Gotsman for fruitful discussions on Leaderless SMR.

Cite AsGet BibTex

Tuanir França Rezende and Pierre Sutra. Leaderless State-Machine Replication: Specification, Properties, Limits. In 34th International Symposium on Distributed Computing (DISC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 179, pp. 24:1-24:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.DISC.2020.24

Abstract

Modern Internet services commonly replicate critical data across several geographical locations using state-machine replication (SMR). Due to their reliance on a leader replica, classical SMR protocols offer limited scalability and availability in this setting. To solve this problem, recent protocols follow instead a leaderless approach, in which each replica is able to make progress using a quorum of its peers. In this paper, we study this new emerging class of SMR protocols and states some of their limits. We first propose a framework that captures the essence of leaderless state-machine replication (Leaderless SMR). Then, we introduce a set of desirable properties for these protocols: (R)eliability, (O)ptimal (L)atency and (L)oad Balancing. We show that protocols matching all of the ROLL properties are subject to a trade-off between performance and reliability. We also establish a lower bound on the message delay to execute a command in protocols optimal for the ROLL properties. This lower bound explains the persistent chaining effect observed in experimental results.

Subject Classification

ACM Subject Classification
  • General and reference → Performance
  • Software and its engineering → Distributed systems organizing principles
  • Theory of computation → Distributed computing models
Keywords
  • Fault Tolerance
  • State Machine Replication
  • Consensus

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcos Kawazoe Aguilera and Robert E. Strom. Efficient atomic broadcast using deterministic merge. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’00, page 209–218, New York, NY, USA, 2000. Association for Computing Machinery. URL: https://doi.org/10.1145/343477.343620.
  2. Karolos Antoniadis, Rachid Guerraoui, Dahlia Malkhi, and Dragos-Adrian Seredinschi. State machine replication is more expensive than consensus. In 32nd International Symposium on Distributed Computing, DISC 2018, New Orleans, LA, USA, October 15-19, 2018, pages 7:1-7:18, 2018. URL: https://doi.org/10.4230/LIPIcs.DISC.2018.7.
  3. Balaji Arun, Sebastiano Peluso, Roberto Palmieri, Giuliano Losa, and Binoy Ravindran. Speeding up consensus by chasing fast decisions. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 49-60, 2017. Google Scholar
  4. Marijke H. L. Bodlaender, Magnús M. Halldórsson, Christian Konrad, and Fabian Kuhn. Brief announcement: Local independent set approximation. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25-28, 2016, pages 93-95, 2016. URL: https://doi.org/10.1145/2933057.2933068.
  5. Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering, page 421–430, USA, 2001. IEEE Computer Society. Google Scholar
  6. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Communications of the ACM, 43(2):225-267, 1996. URL: http://www.acm.org/pubs/toc/Abstracts/jacm/226647.html.
  7. Jiaqing Du, Daniele Sciascia, Sameh Elnikety, Willy Zwaenepoel, and Fernando Pedone. Clock-RSM: Low-Latency Inter-datacenter State Machine Replication Using Loosely Synchronized Physical Clocks. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014, pages 343-354, 2014. URL: https://doi.org/10.1109/DSN.2014.42.
  8. Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, and Pierre Sutra. State-machine replication for planet-scale systems. In Proceedings of the Fifteenth European Conference on Computer Systems, EuroSys ’20, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3342195.3387543.
  9. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374-382, April 1985. URL: https://doi.org/10.1145/3149.214121.
  10. Eli Gafni. Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, PODC '98, pages 143-152, New York, NY, USA, 1998. ACM. Google Scholar
  11. Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51-59, June 2002. URL: https://doi.org/10.1145/564585.564601.
  12. Jim Gray and Leslie Lamport. Consensus on transaction commit. ACM Trans. Database Syst., 31(1):133–160, March 2006. URL: https://doi.org/10.1145/1132863.1132867.
  13. Leslie Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133-169, 1998. Google Scholar
  14. Leslie Lamport. Generalized consensus and Paxos. Technical Report MSR-TR-2005-33, Microsoft Research, 2005. Google Scholar
  15. Leslie Lamport. Fast paxos. Distributed Computing, 19(2):79-103, October 2006. Google Scholar
  16. Leslie Lamport. Lower bounds for asynchronous consensus. Distributed Computing, 19(2):104-125, 2006. Google Scholar
  17. Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996. Google Scholar
  18. Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. Mencius: Building efficient replicated state machines for wans. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 369-384, 2008. Google Scholar
  19. Iulian Moraru, David G. Andersen, and Michael Kaminsky. There is more consensus in egalitarian parliaments. In ACM Symposium on Operating Systems Principles (SOSP), pages 358-372, 2013. Google Scholar
  20. Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference (USENIX ATC), pages 305-320, 2014. Google Scholar
  21. Fernando Pedone and André Schiper. Generic broadcast. In International Symposium on Distributed Computing (DISC), pages 94-108, 1999. Google Scholar
  22. Tuanir França Rezende and Pierre Sutra. Leaderless state-machine replication: Specification, properties, limits (extended version), 2020. URL: http://arxiv.org/abs/2008.02512.
  23. André Schiper. Early consensus in an asynchronous system with a weak failure detector. Distrib. Comput., 10(3):149–157, April 1997. URL: https://doi.org/10.1007/s004460050032.
  24. R. Schmidt, L. Camargos, and F. Pedone. Collision-fast atomic broadcast. In 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, pages 1065-1072, 2014. Google Scholar
  25. Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299-319, 1990. Google Scholar
  26. Alexandru Turcu, Sebastiano Peluso, Roberto Palmieri, and Binoy Ravindran. Be general and don't give up consistency in geo-replicated transactional systems. In International Conference on Principles of Distributed Systems (OPODIS), pages 33-48, 2014. Google Scholar
  27. Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, and Lau Cheuk Lung. Spin one’s wheels? byzantine fault tolerance with a spinning primary. In Proceedings of the 2009 28th IEEE International Symposium on Reliable Distributed Systems, SRDS ’09, page 135–144, USA, 2009. IEEE Computer Society. URL: https://doi.org/10.1109/SRDS.2009.36.
  28. Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph M. Hellerstein, and Ion Stoica. "bipartisan paxos: A family of fast, leaderless, modular state machine replication protocols". preprint on webpage at URL: https://mwhittaker.github.io/publications/bipartisan_paxos.pdf.
  29. Piotr Zieliński. Optimistic generic broadcast. In Pierre Fraigniaud, editor, Distributed Computing, pages 369-383, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail