A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs

Authors: Marnix Suilen, Marck van der Vegt, and Sebastian Junges

Published in: LIPIcs, Volume 311, 35th International Conference on Concurrency Theory (CONCUR 2024)

Markov Decision Processes (MDPs) model systems with uncertain transition dynamics. Multiple-environment MDPs (MEMDPs) extend MDPs. They intuitively reflect finite sets of MDPs that share the same state and action spaces but differ in the transition dynamics. The key objective in MEMDPs is to find a single strategy that satisfies a given objective in every associated MDP. The main result of this paper is PSPACE-completeness for almost-sure Rabin objectives in MEMDPs. This result clarifies the complexity landscape for MEMDPs and contrasts with results for the more general class of partially observable MDPs (POMDPs), where almost-sure reachability is already EXP-complete, and almost-sure Rabin objectives are undecidable.

Marnix Suilen, Marck van der Vegt, and Sebastian Junges. A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs. In 35th International Conference on Concurrency Theory (CONCUR 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 311, pp. 40:1-40:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

  author =	{Suilen, Marnix and van der Vegt, Marck and Junges, Sebastian},
  title =	{{A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs}},
  booktitle =	{35th International Conference on Concurrency Theory (CONCUR 2024)},
  pages =	{40:1--40:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-339-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{311},
  editor =	{Majumdar, Rupak and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-208120},
  doi =		{10.4230/LIPIcs.CONCUR.2024.40},
  annote =	{Keywords: Markov Decision Processes, partial observability, linear-time Objectives}
Scalable Analysis of Probabilistic Models and Programs (Dagstuhl Seminar 23241)

Authors: Sebastian Junges, Joost-Pieter Katoen, Scott Sanner, Guy Van den Broeck, and Bahare Salmani

Published in: Dagstuhl Reports, Volume 13, Issue 6 (2024)

This report documents the program and the outcomes of Dagstuhl Seminar 23241 "Scalable Analysis of Probabilistic Models and Programs". The seminar brought together researchers from probabilistic graphical models, verification of probabilistic programming languages, and probabilistic planning. The communities bring vastly different perspectives on the methods and goals of inference under uncertainty. In this seminar, we worked towards a common understanding of how the different angles yield subtle differences in the problem statements and how the different methods provide different strengths and weaknesses. The report describes the different areas, the activities during the seminar including hot topics that were vividly discussed, and an overview of the technical talks.

Sebastian Junges, Joost-Pieter Katoen, Scott Sanner, Guy Van den Broeck, and Bahare Salmani. Scalable Analysis of Probabilistic Models and Programs (Dagstuhl Seminar 23241). In Dagstuhl Reports, Volume 13, Issue 6, pp. 1-21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

  author =	{Junges, Sebastian and Katoen, Joost-Pieter and Sanner, Scott and Van den Broeck, Guy and Salmani, Bahare},
  title =	{{Scalable Analysis of Probabilistic Models and Programs (Dagstuhl Seminar 23241)}},
  pages =	{1--21},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2024},
  volume =	{13},
  number =	{6},
  editor =	{Junges, Sebastian and Katoen, Joost-Pieter and Sanner, Scott and Van den Broeck, Guy and Salmani, Bahare},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-196362},
  doi =		{10.4230/DagRep.13.6.1},
  annote =	{Keywords: model counting, probabilistic inference, probabilistic model checking, probabilistic planning, probabilistic programs}
Invited Paper
Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)

Authors: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem. Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper). In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

  author =	{Jansen, Nils and K\"{o}nighofer, Bettina and Junges, Sebastian and Serban, Alex and Bloem, Roderick},
  title =	{{Safe Reinforcement Learning Using Probabilistic Shields}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{3:1--3:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-128155},
  doi =		{10.4230/LIPIcs.CONCUR.2020.3},
  annote =	{Keywords: Safe Reinforcement Learning, Formal Verification, Safe Exploration, Model Checking, Markov Decision Process}
On the Complexity of Reachability in Parametric Markov Decision Processes

Authors: Tobias Winkler, Sebastian Junges, Guillermo A. Pérez, and Joost-Pieter Katoen

Published in: LIPIcs, Volume 140, 30th International Conference on Concurrency Theory (CONCUR 2019)

This paper studies parametric Markov decision processes (pMDPs), an extension to Markov decision processes (MDPs) where transitions probabilities are described by polynomials over a finite set of parameters. Fixing values for all parameters yields MDPs. In particular, this paper studies the complexity of finding values for these parameters such that the induced MDP satisfies some reachability constraints. We discuss different variants depending on the comparison operator in the constraints and the domain of the parameter values. We improve all known lower bounds for this problem, and notably provide ETR-completeness results for distinct variants of this problem. Furthermore, we provide insights in the functions describing the induced reachability probabilities, and how pMDPs generalise concurrent stochastic reachability games.

Tobias Winkler, Sebastian Junges, Guillermo A. Pérez, and Joost-Pieter Katoen. On the Complexity of Reachability in Parametric Markov Decision Processes. In 30th International Conference on Concurrency Theory (CONCUR 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 140, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

  author =	{Winkler, Tobias and Junges, Sebastian and P\'{e}rez, Guillermo A. and Katoen, Joost-Pieter},
  title =	{{On the Complexity of Reachability in Parametric Markov Decision Processes}},
  booktitle =	{30th International Conference on Concurrency Theory (CONCUR 2019)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-121-4},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{140},
  editor =	{Fokkink, Wan and van Glabbeek, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-109162},
  doi =		{10.4230/LIPIcs.CONCUR.2019.14},
  annote =	{Keywords: Parametric Markov decision processes, Formal verification, ETR, Complexity}
