,
Robert Gifford
,
Qingjie Lu
,
Andreas Haeberlen
,
Linh Thi Xuan Phan
Creative Commons Attribution 4.0 International license
Distributed Systems are commonly built using a set of standard assumptions: we assume that message delays are unbounded, that any packet can be lost in the network, and that clocks cannot be closely synchronized. On the one hand, these conservative assumptions result in robust systems that can operate reliably in a wide variety of conditions. On the other hand, they also force the system to do a lot of complex ad-hoc coordination and thus limit the performance it can achieve. In this paper, we take a look at what lies beyond this standard model. We observe that, on modern hardware in a single-tenant data center, distributed systems are able to closely coordinate and essentially "run like clockwork" with very little effort. If we are willing to additionally rule out some worst-case failure scenarios, this results in a large performance improvement, both in practice and even in theory. We demonstrate this effect using state-machine replication (SMR) as a case study: our SMR protocol, Watchmaker, exceeds the throughput of state-of-the-art algorithms by two orders of magnitude, and it requires only half as many replicas to tolerate the same number of faults.
@InProceedings{newatia_et_al:OASIcs.NINeS.2026.26,
author = {Newatia, Karan and Gifford, Robert and Lu, Qingjie and Haeberlen, Andreas and Phan, Linh Thi Xuan},
title = {{Running Distributed Systems like Clockwork}},
booktitle = {1st New Ideas in Networked Systems (NINeS 2026)},
pages = {26:1--26:31},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-414-7},
ISSN = {2190-6807},
year = {2026},
volume = {139},
editor = {Argyraki, Katerina and Panda, Aurojit},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.NINeS.2026.26},
URN = {urn:nbn:de:0030-drops-256115},
doi = {10.4230/OASIcs.NINeS.2026.26},
annote = {Keywords: State-machine replication, distributed systems, data centers, clock synchronization, fault tolerance, synchrony}
}