DROPS

Document

DOI: 10.4230/LIPIcs.DISC.2025.39

TEE Is Not a Healer: Rollback-Resistant Reliable Storage

Authors: Sadegh Keshavarzi, Gregory Chockler, and Alexey Gotsman

Published in: LIPIcs, Volume 356, 39th International Symposium on Distributed Computing (DISC 2025)

Abstract

Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and state within a secure enclave known as a Trusted Execution Environment (TEE). However, the protection offered by a TEE only applies during program execution. Once power is switched off, the non-volatile portion of the program state becomes vulnerable to rollback attacks wherein it is undetectably reverted to an older version. In this paper we consider the problem of implementing reliable read/write registers out of failure-prone replicas subject to state rollbacks. To this end, we introduce a new unified model that captures multiple failure types that can affect a TEE-based system and establish tight bounds on the fault-tolerance of register constructions in this model. We consider both the static case, where failure thresholds hold throughout the entire execution, and the dynamic case, where any number of replicas can roll back, provided these failures do not occur too often. Our dynamic register emulation algorithm, TEE-Rex , provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.

Cite as

Sadegh Keshavarzi, Gregory Chockler, and Alexey Gotsman. TEE Is Not a Healer: Rollback-Resistant Reliable Storage. In 39th International Symposium on Distributed Computing (DISC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 356, pp. 39:1-39:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{keshavarzi_et_al:LIPIcs.DISC.2025.39,
  author =	{Keshavarzi, Sadegh and Chockler, Gregory and Gotsman, Alexey},
  title =	{{TEE Is Not a Healer: Rollback-Resistant Reliable Storage}},
  booktitle =	{39th International Symposium on Distributed Computing (DISC 2025)},
  pages =	{39:1--39:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-402-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{356},
  editor =	{Kowalski, Dariusz R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2025.39},
  URN =		{urn:nbn:de:0030-drops-248560},
  doi =		{10.4230/LIPIcs.DISC.2025.39},
  annote =	{Keywords: Trusted execution environments, fault tolerance, crash recovery}
}

Document

DOI: 10.4230/LIPIcs.DISC.2024.10

Vertical Atomic Broadcast and Passive Replication

Authors: Manuel Bravo, Gregory Chockler, Alexey Gotsman, Alejandro Naser-Pastoriza, and Christian Roldán

Published in: LIPIcs, Volume 319, 38th International Symposium on Distributed Computing (DISC 2024)

Abstract

Atomic broadcast is a reliable communication abstraction ensuring that all processes deliver the same set of messages in a common global order. It is a fundamental building block for implementing fault-tolerant services using either active (aka state-machine) or passive (aka primary-backup) replication. We consider the problem of implementing reconfigurable atomic broadcast, which further allows users to dynamically alter the set of participating processes, e.g., in response to failures or changes in the load. We give a complete safety and liveness specification of this communication abstraction and propose a new protocol implementing it, called Vertical Atomic Broadcast, which uses an auxiliary service to facilitate reconfiguration. In contrast to prior proposals, our protocol significantly reduces system downtime when reconfiguring from a functional configuration by allowing it to continue processing messages while agreement on the next configuration is in progress. Furthermore, we show that this advantage can be maintained even when our protocol is modified to support a stronger variant of atomic broadcast required for passive replication.

Cite as

Manuel Bravo, Gregory Chockler, Alexey Gotsman, Alejandro Naser-Pastoriza, and Christian Roldán. Vertical Atomic Broadcast and Passive Replication. In 38th International Symposium on Distributed Computing (DISC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 319, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bravo_et_al:LIPIcs.DISC.2024.10,
  author =	{Bravo, Manuel and Chockler, Gregory and Gotsman, Alexey and Naser-Pastoriza, Alejandro and Rold\'{a}n, Christian},
  title =	{{Vertical Atomic Broadcast and Passive Replication}},
  booktitle =	{38th International Symposium on Distributed Computing (DISC 2024)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-352-2},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{319},
  editor =	{Alistarh, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2024.10},
  URN =		{urn:nbn:de:0030-drops-212363},
  doi =		{10.4230/LIPIcs.DISC.2024.10},
  annote =	{Keywords: Reconfiguration, consensus, replication}
}

Document

DOI: 10.4230/LIPIcs.DISC.2024.11

What Cannot Be Implemented on Weak Memory?

Authors: Armando Castañeda, Gregory Chockler, Brijesh Dongol, and Ori Lahav

Published in: LIPIcs, Volume 319, 38th International Symposium on Distributed Computing (DISC 2024)

Abstract

We present a general methodology for establishing the impossibility of implementing certain concurrent objects on different (weak) memory models. The key idea behind our approach lies in characterizing memory models by their mergeability properties, identifying restrictions under which independent memory traces can be merged into a single valid memory trace. In turn, we show that the mergeability properties of the underlying memory model entail similar mergeability requirements on the specifications of objects that can be implemented on that memory model. We demonstrate the applicability of our approach to establish the impossibility of implementing standard distributed objects with different restrictions on memory traces on three memory models: strictly consistent memory, total store order, and release-acquire. These impossibility results allow us to identify tight and almost tight bounds for some objects, as well as new separation results between weak memory models, and between well-studied objects based on their implementability on weak memory models.

Cite as

Armando Castañeda, Gregory Chockler, Brijesh Dongol, and Ori Lahav. What Cannot Be Implemented on Weak Memory?. In 38th International Symposium on Distributed Computing (DISC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 319, pp. 11:1-11:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{castaneda_et_al:LIPIcs.DISC.2024.11,
  author =	{Casta\~{n}eda, Armando and Chockler, Gregory and Dongol, Brijesh and Lahav, Ori},
  title =	{{What Cannot Be Implemented on Weak Memory?}},
  booktitle =	{38th International Symposium on Distributed Computing (DISC 2024)},
  pages =	{11:1--11:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-352-2},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{319},
  editor =	{Alistarh, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2024.11},
  URN =		{urn:nbn:de:0030-drops-212371},
  doi =		{10.4230/LIPIcs.DISC.2024.11},
  annote =	{Keywords: Impossibility, Weak Memory Models, Total-Store Order, Release-Acquire}
}

Document

DOI: 10.4230/LIPIcs.OPODIS.2023.21

Fault-Tolerant Computing with Unreliable Channels

Authors: Alejandro Naser-Pastoriza, Gregory Chockler, and Alexey Gotsman

Published in: LIPIcs, Volume 286, 27th International Conference on Principles of Distributed Systems (OPODIS 2023)

Abstract

We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels.

Cite as

Alejandro Naser-Pastoriza, Gregory Chockler, and Alexey Gotsman. Fault-Tolerant Computing with Unreliable Channels. In 27th International Conference on Principles of Distributed Systems (OPODIS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 286, pp. 21:1-21:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{naserpastoriza_et_al:LIPIcs.OPODIS.2023.21,
  author =	{Naser-Pastoriza, Alejandro and Chockler, Gregory and Gotsman, Alexey},
  title =	{{Fault-Tolerant Computing with Unreliable Channels}},
  booktitle =	{27th International Conference on Principles of Distributed Systems (OPODIS 2023)},
  pages =	{21:1--21:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-308-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{286},
  editor =	{Bessani, Alysson and D\'{e}fago, Xavier and Nakamura, Junya and Wada, Koichi and Yamauchi, Yukiko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2023.21},
  URN =		{urn:nbn:de:0030-drops-195118},
  doi =		{10.4230/LIPIcs.OPODIS.2023.21},
  annote =	{Keywords: Consensus, network partitions, liveness, synchronizers}
}

Document

DOI: 10.4230/LIPIcs.DISC.2022.12

Liveness and Latency of Byzantine State-Machine Replication

Authors: Manuel Bravo, Gregory Chockler, and Alexey Gotsman

Published in: LIPIcs, Volume 246, 36th International Symposium on Distributed Computing (DISC 2022)

Abstract

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyze the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Cite as

Manuel Bravo, Gregory Chockler, and Alexey Gotsman. Liveness and Latency of Byzantine State-Machine Replication. In 36th International Symposium on Distributed Computing (DISC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 246, pp. 12:1-12:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bravo_et_al:LIPIcs.DISC.2022.12,
  author =	{Bravo, Manuel and Chockler, Gregory and Gotsman, Alexey},
  title =	{{Liveness and Latency of Byzantine State-Machine Replication}},
  booktitle =	{36th International Symposium on Distributed Computing (DISC 2022)},
  pages =	{12:1--12:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-255-6},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{246},
  editor =	{Scheideler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2022.12},
  URN =		{urn:nbn:de:0030-drops-172037},
  doi =		{10.4230/LIPIcs.DISC.2022.12},
  annote =	{Keywords: Replication, blockchain, partial synchrony, liveness}
}

Document

DOI: 10.4230/LIPIcs.DISC.2020.23

Making Byzantine Consensus Live

Authors: Manuel Bravo, Gregory Chockler, and Alexey Gotsman

Published in: LIPIcs, Volume 179, 34th International Symposium on Distributed Computing (DISC 2020)

Abstract

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including HotStuff, Tendermint, PBFT and SBFT. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs.

Cite as

Manuel Bravo, Gregory Chockler, and Alexey Gotsman. Making Byzantine Consensus Live. In 34th International Symposium on Distributed Computing (DISC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 179, pp. 23:1-23:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bravo_et_al:LIPIcs.DISC.2020.23,
  author =	{Bravo, Manuel and Chockler, Gregory and Gotsman, Alexey},
  title =	{{Making Byzantine Consensus Live}},
  booktitle =	{34th International Symposium on Distributed Computing (DISC 2020)},
  pages =	{23:1--23:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-168-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{179},
  editor =	{Attiya, Hagit},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2020.23},
  URN =		{urn:nbn:de:0030-drops-131013},
  doi =		{10.4230/LIPIcs.DISC.2020.23},
  annote =	{Keywords: Byzantine consensus, blockchain, partial synchrony, liveness}
}

Document

DOI: 10.4230/LIPIcs.DISC.2018.14

Multi-Shot Distributed Transaction Commit

Authors: Gregory Chockler and Alexey Gotsman

Published in: LIPIcs, Volume 121, 32nd International Symposium on Distributed Computing (DISC 2018)

Abstract

Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.

Cite as

Gregory Chockler and Alexey Gotsman. Multi-Shot Distributed Transaction Commit. In 32nd International Symposium on Distributed Computing (DISC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 121, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chockler_et_al:LIPIcs.DISC.2018.14,
  author =	{Chockler, Gregory and Gotsman, Alexey},
  title =	{{Multi-Shot Distributed Transaction Commit}},
  booktitle =	{32nd International Symposium on Distributed Computing (DISC 2018)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-092-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{121},
  editor =	{Schmid, Ulrich and Widder, Josef},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2018.14},
  URN =		{urn:nbn:de:0030-drops-98038},
  doi =		{10.4230/LIPIcs.DISC.2018.14},
  annote =	{Keywords: Atomic commit problem, two-phase commit, Paxos}
}

Document

Keynote

DOI: 10.4230/LIPIcs.OPODIS.2015.4

Space Bounds for Reliable Storage: Fundamental Limits of Coding (Keynote)

Authors: Alexander Spiegelman, Yuval Cassuto, Gregory Chockler, and Idit Keidar

Published in: LIPIcs, Volume 46, 19th International Conference on Principles of Distributed Systems (OPODIS 2015)

Abstract

We present here a synopsis of a keynote presentation given by Idit Keidar at OPODIS 2015, the International Conference on Principles of Distributed Systems, which took place in Rennes, France, on December 14-17 2015.

Cite as

Alexander Spiegelman, Yuval Cassuto, Gregory Chockler, and Idit Keidar. Space Bounds for Reliable Storage: Fundamental Limits of Coding (Keynote). In 19th International Conference on Principles of Distributed Systems (OPODIS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 46, pp. 4:1-4:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{spiegelman_et_al:LIPIcs.OPODIS.2015.4,
  author =	{Spiegelman, Alexander and Cassuto, Yuval and Chockler, Gregory and Keidar, Idit},
  title =	{{Space Bounds for Reliable Storage: Fundamental Limits of Coding}},
  booktitle =	{19th International Conference on Principles of Distributed Systems (OPODIS 2015)},
  pages =	{4:1--4:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-98-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{46},
  editor =	{Anceaume, Emmanuelle and Cachin, Christian and Potop-Butucaru, Maria},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2015.4},
  URN =		{urn:nbn:de:0030-drops-65957},
  doi =		{10.4230/LIPIcs.OPODIS.2015.4},
  annote =	{Keywords: distributed storage, impossibility}
}

Search Results

Documents authored by Chockler, Gregory

TEE Is Not a Healer: Rollback-Resistant Reliable Storage

Abstract

Cite as

Vertical Atomic Broadcast and Passive Replication

Abstract

Cite as

What Cannot Be Implemented on Weak Memory?

Abstract

Cite as

Fault-Tolerant Computing with Unreliable Channels

Abstract

Cite as

Liveness and Latency of Byzantine State-Machine Replication

Abstract

Cite as

Making Byzantine Consensus Live

Abstract

Cite as

Multi-Shot Distributed Transaction Commit

Abstract

Cite as

Space Bounds for Reliable Storage: Fundamental Limits of Coding (Keynote)

Abstract

Cite as

Thanks for your feedback!

Could not send message