Dagstuhl Reports, Volume 10, Issue 3



Thumbnail PDF

Event

Dagstuhl Seminars 20101, 20111

Publication Details


Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Issue
Dagstuhl Reports, Volume 10, Issue 3, March 2020, Complete Issue

Abstract
Dagstuhl Reports, Volume 10, Issue 3, March 2020, Complete Issue

Cite as

Dagstuhl Reports, Volume 10, Issue 3, pp. 1-72, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@Article{DagRep.10.3,
  title =	{{Dagstuhl Reports, Volume 10, Issue 3, March 2020, Complete Issue}},
  pages =	{1--72},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2020},
  volume =	{10},
  number =	{3},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.10.3},
  URN =		{urn:nbn:de:0030-drops-134271},
  doi =		{10.4230/DagRep.10.3},
  annote =	{Keywords: Dagstuhl Reports, Volume 10, Issue 3, March 2020, Complete Issue}
}
Document
Front Matter
Dagstuhl Reports, Table of Contents, Volume 10, Issue 3, 2020

Abstract
Dagstuhl Reports, Table of Contents, Volume 10, Issue 3, 2020

Cite as

Dagstuhl Reports, Volume 10, Issue 3, pp. i-ii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@Article{DagRep.10.3.i,
  title =	{{Dagstuhl Reports, Table of Contents, Volume 10, Issue 3, 2020}},
  pages =	{i--ii},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2020},
  volume =	{10},
  number =	{3},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.10.3.i},
  URN =		{urn:nbn:de:0030-drops-134282},
  doi =		{10.4230/DagRep.10.3.i},
  annote =	{Keywords: Table of Contents, Frontmatter}
}
Document
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101)

Authors: Luc Giraud, Ulrich Rüde, and Linda Stals


Abstract
This work is based on the seminar titled "Resiliency in Numerical Algorithm Design for Extreme Scale Simulations" held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of an enormous amount of resources. A typical large-scale computation running for 48 hours on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 10^{23} floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications, and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.

Cite as

Luc Giraud, Ulrich Rüde, and Linda Stals. Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101). In Dagstuhl Reports, Volume 10, Issue 3, pp. 1-57, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@Article{giraud_et_al:DagRep.10.3.1,
  author =	{Giraud, Luc and R\"{u}de, Ulrich and Stals, Linda},
  title =	{{Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101)}},
  pages =	{1--57},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2020},
  volume =	{10},
  number =	{3},
  editor =	{Giraud, Luc and R\"{u}de, Ulrich and Stals, Linda},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.10.3.1},
  URN =		{urn:nbn:de:0030-drops-134290},
  doi =		{10.4230/DagRep.10.3.1},
  annote =	{Keywords: Numerical algorithms, Parallel computer architecture, Fault tolerance, Resilience}
}
Document
Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111)

Authors: Paolo Bientinesi, David Ham, Furong Huang, Paul H. J. Kelly, Christian Lengauer, and Saday Sadayappan


Abstract
Tensors are higher-dimensional analogs of matrices, and represent a key data abstraction for many applications in computational science and data science. In contrast to the wide availability on diverse hardware platforms of high-performance numerical libraries for matrix computations, only limited software infrastructure exists today for high-performance tensor computations. Recent research developments have resulted in the formulation of many machine learning algorithms in terms of tensor computations. Tensor computations have also emerged as fundamental building blocks for many algorithms in data science and computational science. Therefore, several concurrent efforts have targeted the development of libraries, frameworks, and domain-specific compilers to support the rising demand for high-performance tensor computations. However, there is currently very little coordination among the various groups of developers. Further, the groups developing high-performance libraries/frameworks for tensor computations are still rather disconnected from the research community that develops applications using tensors as a key data abstraction. The main goal of this Dagstuhl Seminar has been to bring together the following two communities: first researchers from disciplines developing applications centered around tensor computations, and second researchers developing software infrastructure for efficient tensor computation primitives. Invitees from the former group included experts in machine learning and data analytics, and computational scientists developing tensor-based applications. Invitees from the latter group spanned experts in compiler optimization and experts in numerical methods. A very fruitful exchange of ideas across these four research communities took place, with discussions on the variety of needs and use-cases for tensor computations and the challenges/opportunities in the development of high-performance software to satisfy those needs.

Cite as

Paolo Bientinesi, David Ham, Furong Huang, Paul H. J. Kelly, Christian Lengauer, and Saday Sadayappan. Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111). In Dagstuhl Reports, Volume 10, Issue 3, pp. 58-70, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@Article{bientinesi_et_al:DagRep.10.3.58,
  author =	{Bientinesi, Paolo and Ham, David and Huang, Furong and Kelly, Paul H. J. and Lengauer, Christian and Sadayappan, Saday},
  title =	{{Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111)}},
  pages =	{58--70},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2020},
  volume =	{10},
  number =	{3},
  editor =	{Bientinesi, Paolo and Ham, David and Huang, Furong and Kelly, Paul H. J. and Lengauer, Christian and Sadayappan, Saday},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.10.3.58},
  URN =		{urn:nbn:de:0030-drops-134303},
  doi =		{10.4230/DagRep.10.3.58},
  annote =	{Keywords: compilers, computational science, linear algebra, machine learning, numerical methods}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail