,
Pierre Sutra
Creative Commons Attribution 4.0 International license
Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos [Iulian Moraru et al., 2013] introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to f crashes of any processes out of a total of n = 2f+1. The protocol furthermore allows any process to execute a command c fast, in 2 message delays, provided no more than e = ⌈(f+1)/2⌉ other processes fail, and all concurrently submitted commands commute with c; the latter condition is often satisfied in practical systems.
Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* - a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds f and e such that n ≥ max{2e+f-1, 2f+1} - the number of processes that we show to be optimal.
@InProceedings{ryabinin_et_al:LIPIcs.OPODIS.2025.22,
author = {Ryabinin, Fedor and Gotsman, Alexey and Sutra, Pierre},
title = {{Making Democracy Work: Fixing and Simplifying Egalitarian Paxos}},
booktitle = {29th International Conference on Principles of Distributed Systems (OPODIS 2025)},
pages = {22:1--22:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-409-3},
ISSN = {1868-8969},
year = {2026},
volume = {361},
editor = {Arusoaie, Andrei and Onica, Emanuel and Spear, Michael and Tucci-Piergiovanni, Sara},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2025.22},
URN = {urn:nbn:de:0030-drops-251955},
doi = {10.4230/LIPIcs.OPODIS.2025.22},
annote = {Keywords: Consensus, state-machine replication, fault tolerance}
}