DROPS

Document

DOI: 10.4230/LIPIcs.CONCUR.2020.21

Model-Free Reinforcement Learning for Stochastic Parity Games

Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)

Abstract

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

Cite as

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-Free Reinforcement Learning for Stochastic Parity Games. In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{hahn_et_al:LIPIcs.CONCUR.2020.21,
  author =	{Hahn, Ernst Moritz and Perez, Mateo and Schewe, Sven and Somenzi, Fabio and Trivedi, Ashutosh and Wojtczak, Dominik},
  title =	{{Model-Free Reinforcement Learning for Stochastic Parity Games}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.21},
  URN =		{urn:nbn:de:0030-drops-128332},
  doi =		{10.4230/LIPIcs.CONCUR.2020.21},
  annote =	{Keywords: Reinforcement learning, Stochastic games, Omega-regular objectives}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICALP.2020.3

How to Play in Infinite MDPs (Invited Talk)

Authors: Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke, and Dominik Wojtczak

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

Abstract

Markov decision processes (MDPs) are a standard model for dynamic systems that exhibit both stochastic and nondeterministic behavior. For MDPs with finite state space it is known that for a wide range of objectives there exist optimal strategies that are memoryless and deterministic. In contrast, if the state space is infinite, optimal strategies may not exist, and optimal or ε-optimal strategies may require (possibly infinite) memory. In this paper we consider qualitative objectives: reachability, safety, (co-)Büchi, and other parity objectives. We aim at giving an introduction to a collection of techniques that allow for the construction of strategies with little or no memory in countably infinite MDPs.

Cite as

Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke, and Dominik Wojtczak. How to Play in Infinite MDPs (Invited Talk). In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 3:1-3:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kiefer_et_al:LIPIcs.ICALP.2020.3,
  author =	{Kiefer, Stefan and Mayr, Richard and Shirmohammadi, Mahsa and Totzke, Patrick and Wojtczak, Dominik},
  title =	{{How to Play in Infinite MDPs}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{3:1--3:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.3},
  URN =		{urn:nbn:de:0030-drops-124103},
  doi =		{10.4230/LIPIcs.ICALP.2020.3},
  annote =	{Keywords: Markov decision processes}
}

Search Results

Documents authored by Wojtczak, Dominik

Model-Free Reinforcement Learning for Stochastic Parity Games

Abstract

Cite as

How to Play in Infinite MDPs (Invited Talk)

Abstract

Cite as

Thanks for your feedback!

Could not send message