Twins: BFT Systems Made Robust

Authors Shehar Bano, Alberto Sonnino, Andrey Chursin, Dmitri Perelman, Zekun Li, Avery Ching, Dahlia Malkhi



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2021.7.pdf
  • Filesize: 0.99 MB
  • 29 pages

Document Identifiers

Author Details

Shehar Bano
  • Facebook Novi, London, UK
Alberto Sonnino
  • Facebook Novi, London, UK
Andrey Chursin
  • Facebook Novi, Menlo Park, CA, USA
Dmitri Perelman
  • Facebook Novi, Menlo Park, CA, USA
Zekun Li
  • Facebook Novi, Menlo Park, CA, USA
Avery Ching
  • Facebook Novi, Menlo Park, CA, USA
Dahlia Malkhi
  • Facebook Novi, Menlo Park, CA, USA

Acknowledgements

The authors would like to thank Ben Maurer, David Dill, Daniel Xiang, Kartik Nayak, Ling Ren, and Scott Stoller for feedback on late manuscript, and George Danezis for comments on early manuscript. We also thank the Novi Research and Engineering teams for valuable feedback.

Cite AsGet BibTex

Shehar Bano, Alberto Sonnino, Andrey Chursin, Dmitri Perelman, Zekun Li, Avery Ching, and Dahlia Malkhi. Twins: BFT Systems Made Robust. In 25th International Conference on Principles of Distributed Systems (OPODIS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 217, pp. 7:1-7:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.OPODIS.2021.7

Abstract

This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting "locks" guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same identities and network credentials. To the rest of the system, the twins appear indistinguishable from a single node behaving in a "questionable" manner. Twins can systematically generate Byzantine attack scenarios at scale, execute them in a controlled manner, and examine their behavior. Twins scenarios iterate over protocol rounds and vary the communication patterns among nodes. Twins runs in a production setting within DiemBFT where it can execute 44M Twins-generated scenarios daily. Whereas the system at hand did not manifest errors, subtle safety bugs that were deliberately injected for the purpose of validating the implementation of Twins itself were exposed within minutes. Twins can prevent developers from regressing correctness when updating the codebase, introducing new features, or performing routine maintenance tasks. Twins only requires a thin wrapper over DiemBFT, we thus envision other systems using it. Building on this idea, one new attack and several known attacks against other BFT protocols were materialized as Twins scenarios. In all cases, the target protocols break within fewer than a dozen protocol rounds, hence it is realistic for the Twins approach to expose the problems.

Subject Classification

ACM Subject Classification
  • Security and privacy → Distributed systems security
Keywords
  • Distributed Systems
  • Byzantine Fault Tolerance
  • Real-World Deployment

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ittai Abraham, Guy Gueta, Dahlia Malkhi, and Jean-Philippe Martin. Revisiting Fast Practical Byzantine Fault Tolerance: Thelma, Velma, and Zelma. arXiv preprint arXiv:1801.10022, 2018. Google Scholar
  2. Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Maofan Yin. Sync HotStuff: Simple and Practical Synchronous State Machine Replication. In IEEE Symposium on Security and Privacy, 2020. Google Scholar
  3. Peter Alvaro, Joshua Rosen, and Joseph M. Hellerstein. Lineage-Driven Fault Injection. In SIGMOD International Conference on Management of Data, 2015. Google Scholar
  4. Inc. Amazon Web Services. AWS Whitepapers. https://aws.amazon.com/whitepapers, 2017.
  5. Christel Baier and Joost-Pieter Katoen. Principles of Model Checking (Representation and Mind Series). The MIT Press, 2008. Google Scholar
  6. Ethan Buchman. Tendermint: Byzantine Fault Tolerance in the Age of Blockchains. https://cdn.relayto.com/media/files/LPgoWO18TCeMIggJVakt_tendermint.pdf, 2016.
  7. Ethan Buchman, Jae Kwon, and Zarko Milosevic. The Latest Gossip on BFT Consensus. arXiv preprint arXiv:1807.04938, 2018. Google Scholar
  8. Vitalik Buterin and Virgil Griffith. Casper the Friendly Finality Gadget. arXiv preprint arXiv:1710.09437, 2017. Google Scholar
  9. Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. In USENIX Symposium on Operating Systems Design and Implementation, 1999. Google Scholar
  10. Ang Chen, W Brad Moore, Hanjun Xiao, Andreas Haeberlen, Linh Thi Xuan Phan, Micah Sherr, and Wenchao Zhou. Detecting Covert Timing Channels with Time-Deterministic Replay. In USENIX Symposium on Operating Systems Design and Implementation, pages 541-554, 2014. Google Scholar
  11. Ang Chen, Yang Wu, Andreas Haeberlen, Wenchao Zhou, and Boon Thau Loo. The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance. In ACM SIGCOMM Conference, 2016. Google Scholar
  12. Cosmos. Cosmos Game of Stakes, 2018. URL: https://github.com/cosmos/game-of-stakes.
  13. Diem. DiemBFTBFT. URL: https://github.com/diem/diem.
  14. Patrice Godefroid, J. van Leeuwen, J. Hartmanis, G. Goos, and Pierre Wolper. Partial-Order Methods for the Verification of Concurrent Systems: An Approach to the State-Explosion Problem. Springer-Verlag, 1996. Google Scholar
  15. Mohammad M Jalalzai, Jianyu Niu, and Chen Feng. Fast-hotstuff: A fast and resilient hotstuff protocol. arXiv preprint arXiv:2010.11454, 2020. Google Scholar
  16. Jepsen. Distributed Systems Safety Research. URL: https://jepsen.io.
  17. Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: Speculative Byzantine Fault Tolerance. In ACM SIGOPS Symposium on Operating Systems Principles, 2007. Google Scholar
  18. Leslie Lamport. The Temporal Logic of Actions. ACM Transactions on Programming Languages and Systems, May 1994. Google Scholar
  19. Leslie Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, 4:382-401, 1982. Google Scholar
  20. Hyojeong Lee, Jeff Seibert, Endadul Hoque, Charles Killian, and Cristina Nita-Rotaru. Turret: A platform for automated attack finding in unmodified distributed system implementations. In 2014 IEEE 34th International Conference on Distributed Computing Systems, pages 660-669. IEEE, 2014. Google Scholar
  21. Chia-Chi Lin, Virajith Jalaparti, Matthew Caesar, and Jacobus Van der Merwe. DEFINED: Deterministic Execution for Interactive Control-Plane Debugging. In USENIX Technical Conference, 2013. Google Scholar
  22. J-P Martin and Lorenzo Alvisi. Fast Byzantine Consensus. IEEE Transactions on Dependable and Secure Computing, 3(3):202-215, 2006. Google Scholar
  23. Atsuki Momose and Jason Paul Cruz. Force-Locking Attack on Sync Hotstuff. IACR Cryptology ePrint Archive, 2020. Google Scholar
  24. Netflix. Chaos Monkey. URL: https://netflix.github.io/chaosmonkey/.
  25. Filip Niksic. Combinatorial Constructions for Effective Testing. Doctoral thesis, Technische Universität Kaiserslautern, 2019. Google Scholar
  26. Santhosh Prabhu, Kuan Yen Chou, Ali Kheradmand, Brighten Godfrey, and Matthew Caesar. Plankton: Scalable Network Configuration Verification Through Model Checking. In USENIX Symposium on Networked Systems Design and Implementation, 2020. Google Scholar
  27. Basil Cameron Rennie and Annette Jane Dobson. On Stirling Numbers of the Second Kind. Journal of Combinatorial Theory, 7(2):116-121, 1969. Google Scholar
  28. The Diem Team. State Machine Replication in the Libra Blockchain. https://developers.libra.org/docs/assets/papers/libra-consensus-state-machine-replication-in-the-libra-blockchain/2019-11-08.pdf, 2019.
  29. Yang Wu, Mingchen Zhao, Andreas Haeberlen, Wenchao Zhou, and Boon Thau Loo. Diagnosing Missing Events in Distributed Systems with Negative Provenance. ACM SIGCOMM Computer Communication Review, 44(4):383-394, 2014. Google Scholar
  30. Maysam Yabandeh, Nikola Knežević, Dejan Kostić, and Viktor Kuncak. Predicting and Preventing Inconsistencies in Deployed Distributed Systems. ACM Transactions on Computer Systems (TOCS), 28(1):1-49, 2010. Google Scholar
  31. Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. Hotstuff: BFT Consensus in the Lens of Blockchain. arXiv preprint arXiv:1803.05069, 2018. Google Scholar
  32. Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. Hotstuff: BFT Consensus with Linearity and Responsiveness. In ACM Symposium on Principles of Distributed Computing, 2019. Google Scholar