Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems (Dagstuhl Seminar 15281)

Authors Henri Casanova, Ewa Deelman, Yves Robert, Uwe Schwiegelshohn and all authors of the abstracts in this report



PDF
Thumbnail PDF

File

DagRep.5.7.1.pdf
  • Filesize: 0.76 MB
  • 21 pages

Document Identifiers

Author Details

Henri Casanova
Ewa Deelman
Yves Robert
Uwe Schwiegelshohn
and all authors of the abstracts in this report

Cite AsGet BibTex

Henri Casanova, Ewa Deelman, Yves Robert, and Uwe Schwiegelshohn. Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems (Dagstuhl Seminar 15281). In Dagstuhl Reports, Volume 5, Issue 7, pp. 1-21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/DagRep.5.7.1

Abstract

Large-scale systems face two main challenges: failure management and energy management. Failure management, the goal of which is to achieve resilience, is necessary because a large number of hardware resources implies a large number of failures during the execution of an application. Energy management, the goal of which is to optimize of power consumption and to handle thermal issues, is also necessary due to both monetary and environmental constraints since typical applications executed in HPC and/or cloud environments will lead to large power consumption and heat dissipation due to intensive computation and communication workloads. The main objective of this Dagstuhl seminar was to gather two communities: (i)~system-oriented researchers who study high-level resource-provisioning policies, pragmatic resource allocation and scheduling heuristics, novel approaches for designing and deploying systems software infrastructures, and tools for monitoring/measuring the state of the system; and (ii)~algorithm-oriented researchers, who investigate formal models and algorithmic solutions for resilience and energy efficiency problems. Both communities focused around workflow applications during the seminar, and discussed various issues related to the efficient, resilient, and energy efficient execution of workflows in distributed platforms. This report provides a brief executive summary of the seminar and lists all the presented material.
Keywords
  • Fault tolerance
  • Resilience
  • Energy efficiency
  • Distributed and high performance computing
  • Scheduling
  • Workflows

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail