LIPIcs.OPODIS.2021.4.pdf
- Filesize: 0.69 MB
- 17 pages
A very common optimization technique in Machine Learning is Stochastic Gradient Descent (SGD). SGD can easily be distributed: several workers try to estimate the gradient of a loss function, and a central parameter server gathers these estimates. When all workers behave correctly, the more workers we have, the more accurate the gradient estimate is. We call this the Arbitrary Aggregation Accuracy (AAA) property. However, in practice, some workers may be Byzantine (i.e., have an arbitrary behavior). Interestingly, when a fixed fraction of workers is assumed to be Byzantine (e.g. 20%), no existing aggregation scheme has the AAA property. In this paper, we propose the first aggregation scheme that has this property despite a fixed fraction of Byzantine workers (less than 50%). We theoretically prove this property, and then illustrate it with simulations.
Feedback for Dagstuhl Publishing