Challenges for Machine Learning on Distributed Platforms (Invited Talk)

Goldstein, Tom

doi:10.4230/LIPIcs.DISC.2018.2

Document

Challenges for Machine Learning on Distributed Platforms (Invited Talk)

Author Tom Goldstein

Part of: Volume: 32nd International Symposium on Distributed Computing (DISC 2018)
Part of: Series: Leibniz International Proceedings in Informatics (LIPIcs)
Part of: Conference: International Symposium on Distributed Computing (DISC)
License: Creative Commons Attribution 3.0 Unported license
Publication Date: 2018-10-04

PDF

File

LIPIcs.DISC.2018.2.pdf

Filesize: 279 kB
3 pages

Document Identifiers

DOI: 10.4230/LIPIcs.DISC.2018.2
URN: urn:nbn:de:0030-drops-97910

Author Details

Tom Goldstein

University of Maryland, College Park, MD, USA

Cite As Get BibTex

Tom Goldstein. Challenges for Machine Learning on Distributed Platforms (Invited Talk). In 32nd International Symposium on Distributed Computing (DISC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 121, pp. 2:1-2:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.DISC.2018.2

Abstract

Deep neural networks are trained by solving huge optimization problems with large datasets and millions of variables. On the surface, it seems that the size of these problems makes them a natural target for distributed computing. Despite this, most deep learning research still takes place on a single compute node with a small number of GPUs, and only recently have researchers succeeded in unlocking the power of HPC. In this talk, we'll give a brief overview of how deep networks are trained, and use HPC tools to explore and explain deep network behaviors. Then, we'll explain the problems and challenges that arise when scaling deep nets over large system, and highlight reasons why naive distributed training methods fail. Finally, we'll discuss recent algorithmic innovations that have overcome these limitations, including "big batch" training for tightly coupled clusters and supercomputers, and "variance reduction" strategies to reduce communication in high latency settings.

Subject Classification

ACM Subject Classification

Computing methodologies → Machine learning

Keywords

Machine learning
distributed optimization

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Soham De and Tom Goldstein. Efficient distributed SGD with variance reduction. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 111-120. IEEE, 2016. URL: http://dx.doi.org/10.1109/ICDM.2016.0022.
Soham De, Abhay Yadav, David Jacobs, and Tom Goldstein. Automated inference with adaptive batches. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1504-1513. PMLR, 2017. URL: http://proceedings.mlr.press/v54/de17a.html.
Hao Li, Zheng Xu, Gavin Taylor, and Tom Goldstein. Visualizing the loss landscape of neural nets, 2017. URL: http://arxiv.org/abs/1712.09913v1.

Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail

Challenges for Machine Learning on Distributed Platforms (Invited Talk)

Author Tom Goldstein

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message