2 Search Results for "Zhang, Qinzi"

Document

DOI: 10.4230/LIPIcs.OPODIS.2020.7

Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network

Authors: Qinzi Zhang and Lewis Tseng

Published in: LIPIcs, Volume 184, 24th International Conference on Principles of Distributed Systems (OPODIS 2020)

Abstract

In the past few years, many Byzantine-tolerant distributed machine learning (DML) algorithms have been proposed in the point-to-point communication model. In this paper, we focus on a popular DML framework - the parameter server computation paradigm and iterative learning algorithms that proceed in rounds, e.g., [Gupta and Vaidya, 2020; El-Mhamdi et al., 2020; Chen et al., 2017]. One limitation of prior algorithms in this domain is the high communication complexity. All the Byzantine-tolerant DML algorithms that we are aware of need to send n d-dimensional vectors from worker nodes to the parameter server in each round, where n is the number of workers and d is the number of dimensions of the feature space (which may be in the order of millions). In a wireless network, power consumption is proportional to the number of bits transmitted. Consequently, it is extremely difficult, if not impossible, to deploy these algorithms in power-limited wireless devices. Motivated by this observation, we aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network [Alistarh et al., 2010; Bhandari and Vaidya, 2005; Koo, 2004]. Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020 [Gupta and Vaidya, 2020], we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast properties of the radio network to avoid transmitting the raw gradients (full d-dimensional vectors). In the radio network, each worker is able to overhear previous gradients that were transmitted to the parameter server. Roughly speaking, in Echo-CGC, if a worker "agrees" with a combination of prior gradients, it will broadcast the "echo message" instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most n) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send n local gradients in each round, where each gradient is typically a vector in a ultra-high dimensional space (d ≫ n). The improvement on communication complexity of our algorithm depends on multiple factors, including number of nodes, number of faulty workers in an execution, and the cost function. We numerically analyze the improvement, and show that with a large number of nodes, Echo-CGC reduces 80% of the communication under standard assumptions.

Cite as

Qinzi Zhang and Lewis Tseng. Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network. In 24th International Conference on Principles of Distributed Systems (OPODIS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 184, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.OPODIS.2020.7,
  author =	{Zhang, Qinzi and Tseng, Lewis},
  title =	{{Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network}},
  booktitle =	{24th International Conference on Principles of Distributed Systems (OPODIS 2020)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-176-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{184},
  editor =	{Bramas, Quentin and Oshman, Rotem and Romano, Paolo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2020.7},
  URN =		{urn:nbn:de:0030-drops-134927},
  doi =		{10.4230/LIPIcs.OPODIS.2020.7},
  annote =	{Keywords: Distributed Machine Learning, Single-hop Radio Network, Byzantine Fault, Communication Complexity, Wireless Communication, Parameter Server}
}

@InProceedings{zhang_et_al:LIPIcs.OPODIS.2020.7,
  author =	{Zhang, Qinzi and Tseng, Lewis},
  title =	{{Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network}},
  booktitle =	{24th International Conference on Principles of Distributed Systems (OPODIS 2020)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-176-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{184},
  editor =	{Bramas, Quentin and Oshman, Rotem and Romano, Paolo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2020.7},
  URN =		{urn:nbn:de:0030-drops-134927},
  doi =		{10.4230/LIPIcs.OPODIS.2020.7},
  annote =	{Keywords: Distributed Machine Learning, Single-hop Radio Network, Byzantine Fault, Communication Complexity, Wireless Communication, Parameter Server}
}

Document

Brief Announcement

DOI: 10.4230/LIPIcs.DISC.2020.53

Brief Announcement: Reaching Approximate Consensus When Everyone May Crash

Authors: Lewis Tseng, Qinzi Zhang, and Yifan Zhang

Published in: LIPIcs, Volume 179, 34th International Symposium on Distributed Computing (DISC 2020)

Abstract

Fault-tolerant consensus is of great importance in distributed systems. This paper studies the asynchronous approximate consensus problem in the crash-recovery model with fair-loss links. In our model, up to f nodes may crash forever, while the rest may crash intermittently. Each node is equipped with a limited-size persistent storage that does not lose data when crashed. We present an algorithm that only stores three values in persistent storage - state, phase index, and a counter.

Cite as

Lewis Tseng, Qinzi Zhang, and Yifan Zhang. Brief Announcement: Reaching Approximate Consensus When Everyone May Crash. In 34th International Symposium on Distributed Computing (DISC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 179, pp. 53:1-53:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{tseng_et_al:LIPIcs.DISC.2020.53,
  author =	{Tseng, Lewis and Zhang, Qinzi and Zhang, Yifan},
  title =	{{Brief Announcement: Reaching Approximate Consensus When Everyone May Crash}},
  booktitle =	{34th International Symposium on Distributed Computing (DISC 2020)},
  pages =	{53:1--53:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-168-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{179},
  editor =	{Attiya, Hagit},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2020.53},
  URN =		{urn:nbn:de:0030-drops-131319},
  doi =		{10.4230/LIPIcs.DISC.2020.53},
  annote =	{Keywords: Approximate Consensus, Fair-loss Channel, Crash-recovery}
}

Refine by Author
2 Tseng, Lewis
2 Zhang, Qinzi
1 Zhang, Yifan

Refine by Classification
1 Computing methodologies → Distributed computing methodologies
1 Computing methodologies → Machine learning
1 Theory of computation → Distributed algorithms

Refine by Keyword
1 Approximate Consensus
1 Byzantine Fault
1 Communication Complexity
1 Crash-recovery
1 Distributed Machine Learning
Show More...

Refine by Type
2 document

Refine by Publication Year
1 2020
1 2021