Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network

Authors Qinzi Zhang, Lewis Tseng



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2020.7.pdf
  • Filesize: 0.49 MB
  • 16 pages

Document Identifiers

Author Details

Qinzi Zhang
  • Boston College, Chestnut Hill, MA, USA
Lewis Tseng
  • Boston College, Chestnut Hill, MA, USA

Acknowledgements

The authors would like to acknowledgement Nitin H. Vaidya and anonymous reviewers for their helpful comments.

Cite AsGet BibTex

Qinzi Zhang and Lewis Tseng. Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network. In 24th International Conference on Principles of Distributed Systems (OPODIS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 184, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.OPODIS.2020.7

Abstract

In the past few years, many Byzantine-tolerant distributed machine learning (DML) algorithms have been proposed in the point-to-point communication model. In this paper, we focus on a popular DML framework - the parameter server computation paradigm and iterative learning algorithms that proceed in rounds, e.g., [Gupta and Vaidya, 2020; El-Mhamdi et al., 2020; Chen et al., 2017]. One limitation of prior algorithms in this domain is the high communication complexity. All the Byzantine-tolerant DML algorithms that we are aware of need to send n d-dimensional vectors from worker nodes to the parameter server in each round, where n is the number of workers and d is the number of dimensions of the feature space (which may be in the order of millions). In a wireless network, power consumption is proportional to the number of bits transmitted. Consequently, it is extremely difficult, if not impossible, to deploy these algorithms in power-limited wireless devices. Motivated by this observation, we aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network [Alistarh et al., 2010; Bhandari and Vaidya, 2005; Koo, 2004]. Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020 [Gupta and Vaidya, 2020], we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast properties of the radio network to avoid transmitting the raw gradients (full d-dimensional vectors). In the radio network, each worker is able to overhear previous gradients that were transmitted to the parameter server. Roughly speaking, in Echo-CGC, if a worker "agrees" with a combination of prior gradients, it will broadcast the "echo message" instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most n) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send n local gradients in each round, where each gradient is typically a vector in a ultra-high dimensional space (d ≫ n). The improvement on communication complexity of our algorithm depends on multiple factors, including number of nodes, number of faulty workers in an execution, and the cost function. We numerically analyze the improvement, and show that with a large number of nodes, Echo-CGC reduces 80% of the communication under standard assumptions.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Distributed computing methodologies
  • Computing methodologies → Machine learning
Keywords
  • Distributed Machine Learning
  • Single-hop Radio Network
  • Byzantine Fault
  • Communication Complexity
  • Wireless Communication
  • Parameter Server

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zarko Milosevic, and Calvin Newport. Securing every bit: Authenticated broadcast in radio networks. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, page 50–59, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1810479.1810489.
  2. Baruch Awerbuch, Andrea Richa, and Christian Scheideler. A jamming-resistant mac protocol for single-hop wireless networks. In Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing, PODC '08, page 45–54, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1400751.1400759.
  3. Vartika Bhandari and Nitin H. Vaidya. On reliable broadcast in a radio network. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing, PODC '05, page 138–147, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1073814.1073841.
  4. Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In NIPS'17, page 118–128, Red Hook, NY, USA, 2017. Curran Associates Inc. Google Scholar
  5. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, USA, 2004. Google Scholar
  6. Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Anal. Comput. Syst., 1(2), December 2017. URL: https://doi.org/10.1145/3154503.
  7. Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek Patra, and Mahsa Taziki. Asynchronous Byzantine machine learning (the case of SGD). In Jennifer Dy and Andreas Krause, editors, Proceedings of Machine Learning Research, volume 80, pages 1145-1154, Stockholmsmässan, Stockholm Sweden, 2018. PMLR. URL: http://proceedings.mlr.press/v80/damaskinos18a.html.
  8. El-Mahdi El-Mhamdi, Rachid Guerraoui, Arsany Guirguis, Lê Nguyên Hoang, and Sébastien Rouault. Genuinely distributed byzantine machine learning. In Proceedings of the 39th Symposium on Principles of Distributed Computing, PODC '20, page 355–364, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3382734.3405695.
  9. J. Fan and J. Lv. A selective overview of variable selection in high dimensional feature space. Statistica Sinica, pages 101-148, January 2010. Google Scholar
  10. E. J. Gumbel. The maxima of the mean largest value and of the range. The Annals of Mathematical Statistics, 25(1):76-84, 1954. URL: http://www.jstor.org/stable/2236513.
  11. Nirupam Gupta and Nitin H. Vaidya. Fault-tolerance in distributed optimization: The case of redundancy. In Proceedings of the 39th Symposium on Principles of Distributed Computing, PODC '20, page 365–374, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3382734.3405748.
  12. H. O. Hartley and H. A. David. Universal bounds for mean range and extreme observation. The Annals of Mathematical Statistics, 25(1):85-99, 1954. URL: http://www.jstor.org/stable/2236514.
  13. Seyyedali Hosseinalipour, Christopher G. Brinton, Vaneet Aggarwal, Huaiyu Dai, and Mung Chiang. From federated learning to fog learning: Towards large-scale distributed machine learning in heterogeneous wireless networks, 2020. URL: http://arxiv.org/abs/2006.03594.
  14. Chiu-Yuen Koo. Broadcast in radio networks tolerating byzantine adversarial behavior. In Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, PODC '04, page 275–282, New York, NY, USA, 2004. Association for Computing Machinery. URL: https://doi.org/10.1145/1011767.1011807.
  15. Mu Li, David G. Andersen, Alexander Smola, and Kai Yu. Communication efficient distributed machine learning with the parameter server. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS'14, page 19–27, Cambridge, MA, USA, 2014. MIT Press. Google Scholar
  16. Ruben Mayer and Hans-Arno Jacobsen. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Comput. Surv., 53(1), February 2020. URL: https://doi.org/10.1145/3363554.
  17. V. Navda, A. Bohra, S. Ganguly, and D. Rubenstein. Using channel hopping to increase 802.11 resilience to jamming attacks. In IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, pages 2526-2530, 2007. Google Scholar
  18. Nickos Papadatos. Maximum variance of order statistics. Annals of the Institute of Statistical Mathematics, 47:185-193, February 1995. URL: https://doi.org/10.1007/BF00773423.
  19. Lili Su and Nitin H. Vaidya. Fault-tolerant multi-agent optimization: Optimal iterative distributed algorithms. In George Giakkoupis, editor, Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25-28, 2016, pages 425-434. ACM, 2016. URL: https://doi.org/10.1145/2933057.2933105.
  20. Lili Su and Nitin H. Vaidya. Non-bayesian learning in the presence of byzantine agents. In Distributed Computing - 30th International Symposium, DISC 2016, Paris, France, September 27-29, 2016. Proceedings, pages 414-427, 2016. URL: https://doi.org/10.1007/978-3-662-53426-7_30.
  21. Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao. Application of machine learning in wireless networks: Key techniques and open issues. IEEE Communications Surveys Tutorials, 21(4):3072-3108, 2019. Google Scholar
  22. Zeyi Tao and Qun Li. esgd: Communication efficient distributed deep learning on the edge. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18), Boston, MA, July 2018. USENIX Association. URL: https://www.usenix.org/conference/hotedge18/presentation/tao.
  23. Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, and Jan S. Rellermeyer. A survey on distributed machine learning. ACM Comput. Surv., 53(2), March 2020. URL: https://doi.org/10.1145/3377454.
  24. Cong Xie, Sanmi Koyejo, and Indranil Gupta. Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of Machine Learning Research, volume 97, pages 6893-6901, Long Beach, California, USA, 2019. PMLR. URL: http://proceedings.mlr.press/v97/xie19b.html.
  25. Qinzi Zhang and Lewis Tseng. Echo-CGC: A communication-efficient byzantine-tolerant distributed machine learning algorithm in single-hop radio network, 2020. URL: http://arxiv.org/abs/2011.07447.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail