DagSemProc.06371.3.pdf
- Filesize: 211 kB
- 12 pages
Many distributed systems are designed to tolerate the presence of emph{Byzantine} failures: an individual process may arbitrarily deviate from the algorithm assigned to it. Depending on the application requirements, systems enjoy various levels of fault-tolerance. Systems based on state machine replication are able to emph{mask} failures so that their effect is not visible by the application. In contrast, cooperative peer-to-peer systems can tolerate bounded deviant behavior to some extent and therefore do not require masking, as long as each faulty node is emph{exposed}eventually. Finding an abstract way to reason about the levels of fault-tolerance is thus of immanent importance. We discuss how the information of deviant behavior can be abstracted out in the form of a emph{Byzantine failure detector} (BFD). We formally define a BFD abstraction, and we discuss two ways of using the abstraction: (1) monitoring systems in order to retroactively detect Byzantine failures and (2) enforcing systems in order to boost their level of fault-tolerance. Interestingly, the BFD formalism allowed us to determine the relative hardness of implementing two popular abstractions in distributed computing: state machine replication and weak interactive consistency.
Feedback for Dagstuhl Publishing