Arbitrarily Accurate Aggregation Scheme for Byzantine SGD

Maurer, Alexandre

doi:10.4230/LIPIcs.OPODIS.2021.4

Abstract

A very common optimization technique in Machine Learning is Stochastic Gradient Descent (SGD). SGD can easily be distributed: several workers try to estimate the gradient of a loss function, and a central parameter server gathers these estimates. When all workers behave correctly, the more workers we have, the more accurate the gradient estimate is. We call this the Arbitrary Aggregation Accuracy (AAA) property.
However, in practice, some workers may be Byzantine (i.e., have an arbitrary behavior). Interestingly, when a fixed fraction of workers is assumed to be Byzantine (e.g. 20%), no existing aggregation scheme has the AAA property. In this paper, we propose the first aggregation scheme that has this property despite a fixed fraction of Byzantine workers (less than 50%). We theoretically prove this property, and then illustrate it with simulations.

Cite As Get BibTex

Alexandre Maurer. Arbitrarily Accurate Aggregation Scheme for Byzantine SGD. In 25th International Conference on Principles of Distributed Systems (OPODIS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 217, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.OPODIS.2021.4

Author Details

Alexandre Maurer

School of Computer Science, UM6P, Ben Guerir, Morocco

Supplementary Materials

Software (Source Code) https://tinyurl.com/sim-aaa-paper

References

Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li. Byzantine stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 4613-4623, 2018.
Gilad Baruch, Moran Baruch, and Yoav Goldberg. A little is enough: Circumventing defenses for distributed learning. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, editors, NeurIPS 2019, pages 8632-8642, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/ec1c59141046cd1866bbbcdfb6ae31d4-Abstract.html.
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30, pages 119-129. Curran Associates, Inc., 2017.
Amine Boussetta, El-Mahdi El-Mhamdi, Rachid Guerraoui, Alexandre Maurer, and Sébastien Rouault. AKSEL: Fast Byzantine SGD. In 24th International Conference on Principles of Distributed Systems (OPODIS 2020), 2021.
Saikiran Bulusu, Prashant Khanduri, Pranay Sharma, and Pramod K. Varshney. On distributed stochastic gradient descent for nonconvex functions in the presence of byzantines. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020, pages 3137-3141. IEEE, 2020. URL: https://doi.org/10.1109/ICASSP40776.2020.9052956.
Lingjiao Chen, H. Wang, Zachary B. Charles, and Dimitris Papailiopoulos. Draco: Byzantine-resilient distributed training via redundant gradients. In ICML, 2018.
Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):44, 2017.
Anna Choromanska, Mikael Henaff, Michaël Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015, 2015. URL: http://proceedings.mlr.press/v38/choromanska15.html.
Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Arsany Guirguis, and Sébastien Rouault. Aggregathor: Byzantine machine learning via robust gradient aggregation. In SysML, 2019.
Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek Patra, Mahsa Taziki, et al. Asynchronous byzantine machine learning (the case of sgd). In ICML, pages 1153-1162, 2018.
El-Mahdi El-Mhamdi, Rachid Guerraoui, Arsany Guirguis, and Lê Nguyên Hoang. Geniunely distributed byzantine machine learning. In PODC, 2020.
El Mahdi El Mhamdi, Rachid Guerraoui, and Sébastien Rouault. The hidden vulnerability of distributed learning in Byzantium. In Proceedings of the 35th International Conference on Machine Learning, pages 3521-3530. PMLR, 2018.
Arthur Jacot, Clément Hongler, and Franck Gabriel. Neural tangent kernel: Convergence and generalization in neural networks. In NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 8580-8589, 2018. URL: https://proceedings.neurips.cc/paper/2018/hash/5a4be1fa34e62bb8a6ec6b91d2462f5a-Abstract.html.
Kishori M. Konwar, Sanguthevar Rajasekaran, and Alexander A. Shvartsman. Robust network supercomputing with malicious processes. In Shlomi Dolev, editor, Distributed Computing, 20th International Symposium, DISC 2006, Stockholm, Sweden, September 18-20, 2006, Proceedings, volume 4167 of Lecture Notes in Computer Science, pages 474-488. Springer, 2006. URL: https://doi.org/10.1007/11864219_33.
Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382-401, 1982.
Alexandre Maurer. Source code for simulations related to this paper. URL: https://tinyurl.com/sim-aaa-paper.
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
Lili Su and Nitin H. Vaidya. Fault-tolerant multi-agent optimization: Optimal iterative distributed algorithms. In George Giakkoupis, editor, Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25-28, 2016, pages 425-434. ACM, 2016. URL: https://doi.org/10.1145/2933057.2933105.
Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Generalized byzantine-tolerant sgd, 2018.
Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Phocas: dimensional byzantine-resilient stochastic gradient descent, 2018.
Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5650-5659. PMLR, 2018.

Arbitrarily Accurate Aggregation Scheme for Byzantine SGD

Author Alexandre Maurer

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message