Search Results

Documents authored by Bloem, Roderick


Document
Invited Paper
Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)

Authors: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)


Abstract
This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

Cite as

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem. Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper). In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{jansen_et_al:LIPIcs.CONCUR.2020.3,
  author =	{Jansen, Nils and K\"{o}nighofer, Bettina and Junges, Sebastian and Serban, Alex and Bloem, Roderick},
  title =	{{Safe Reinforcement Learning Using Probabilistic Shields}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{3:1--3:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.3},
  URN =		{urn:nbn:de:0030-drops-128155},
  doi =		{10.4230/LIPIcs.CONCUR.2020.3},
  annote =	{Keywords: Safe Reinforcement Learning, Formal Verification, Safe Exploration, Model Checking, Markov Decision Process}
}
Document
Synthesis of Distributed Algorithms with Parameterized Threshold Guards

Authors: Marijana Lazic, Igor Konnov, Josef Widder, and Roderick Bloem

Published in: LIPIcs, Volume 95, 21st International Conference on Principles of Distributed Systems (OPODIS 2017)


Abstract
Fault-tolerant distributed algorithms are notoriously hard to get right. In this paper we introduce an automated method that helps in that process: the designer provides specifications (the problem to be solved) and a sketch of a distributed algorithm that keeps arithmetic details unspecified. Our tool then automatically fills the missing parts. Fault-tolerant distributed algorithms are typically parameterized, that is, they are designed to work for any number n of processes and any number t of faults, provided some resilience condition holds; e.g., n > 3t. In this paper we automatically synthesize distributed algorithms that work for all parameter values that satisfy the resilience condition. We focus on threshold- guarded distributed algorithms, where actions are taken only if a sufficiently large number of messages is received, e.g., more than t or n/2. Both expressions can be derived by choosing the right values for the coefficients a, b, and c, in the sketch of a threshold a·n+b·t+c. Our method takes as input a sketch of an asynchronous threshold-based fault-tolerant distributed algorithm — where the guards are missing exact coefficients—and then iteratively picks the values for the coefficients. Our approach combines recent progress in parameterized model checking of distributed algo- rithms with counterexample-guided synthesis. Besides theoretical results on termination of the synthesis procedure, we experimentally evaluate our method and show that it can synthesize sev- eral distributed algorithms from the literature, e.g., Byzantine reliable broadcast and Byzantine one-step consensus. In addition, for several new variations of safety and liveness specifications, our tool generates new distributed algorithms.

Cite as

Marijana Lazic, Igor Konnov, Josef Widder, and Roderick Bloem. Synthesis of Distributed Algorithms with Parameterized Threshold Guards. In 21st International Conference on Principles of Distributed Systems (OPODIS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 95, pp. 32:1-32:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{lazic_et_al:LIPIcs.OPODIS.2017.32,
  author =	{Lazic, Marijana and Konnov, Igor and Widder, Josef and Bloem, Roderick},
  title =	{{Synthesis of Distributed Algorithms with Parameterized Threshold Guards}},
  booktitle =	{21st International Conference on Principles of Distributed Systems (OPODIS 2017)},
  pages =	{32:1--32:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-061-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{95},
  editor =	{Aspnes, James and Bessani, Alysson and Felber, Pascal and Leit\~{a}o, Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2017.32},
  URN =		{urn:nbn:de:0030-drops-86359},
  doi =		{10.4230/LIPIcs.OPODIS.2017.32},
  annote =	{Keywords: fault-tolerant distributed algorithms, byzantine faults, parameterized model checking, program synthesis}
}