DROPS

Document

Invited Paper

DOI: 10.4230/LIPIcs.CONCUR.2020.3

Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)

Authors: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)

Abstract

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

Cite as

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem. Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper). In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{jansen_et_al:LIPIcs.CONCUR.2020.3,
  author =	{Jansen, Nils and K\"{o}nighofer, Bettina and Junges, Sebastian and Serban, Alex and Bloem, Roderick},
  title =	{{Safe Reinforcement Learning Using Probabilistic Shields}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{3:1--3:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.3},
  URN =		{urn:nbn:de:0030-drops-128155},
  doi =		{10.4230/LIPIcs.CONCUR.2020.3},
  annote =	{Keywords: Safe Reinforcement Learning, Formal Verification, Safe Exploration, Model Checking, Markov Decision Process}
}

Search Results

Documents authored by Serban, Alex

Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)

Abstract

Cite as

Thanks for your feedback!

Could not send message