2 Search Results for "Wojtczak, Dominik"


Document
Model-Free Reinforcement Learning for Stochastic Parity Games

Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak

Published in: LIPIcs, Volume 171, 31st International Conference on Concurrency Theory (CONCUR 2020)


Abstract
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

Cite as

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-Free Reinforcement Learning for Stochastic Parity Games. In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{hahn_et_al:LIPIcs.CONCUR.2020.21,
  author =	{Hahn, Ernst Moritz and Perez, Mateo and Schewe, Sven and Somenzi, Fabio and Trivedi, Ashutosh and Wojtczak, Dominik},
  title =	{{Model-Free Reinforcement Learning for Stochastic Parity Games}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Konnov, Igor and Kov\'{a}cs, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.21},
  URN =		{urn:nbn:de:0030-drops-128332},
  doi =		{10.4230/LIPIcs.CONCUR.2020.21},
  annote =	{Keywords: Reinforcement learning, Stochastic games, Omega-regular objectives}
}
Document
Invited Talk
How to Play in Infinite MDPs (Invited Talk)

Authors: Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke, and Dominik Wojtczak

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)


Abstract
Markov decision processes (MDPs) are a standard model for dynamic systems that exhibit both stochastic and nondeterministic behavior. For MDPs with finite state space it is known that for a wide range of objectives there exist optimal strategies that are memoryless and deterministic. In contrast, if the state space is infinite, optimal strategies may not exist, and optimal or ε-optimal strategies may require (possibly infinite) memory. In this paper we consider qualitative objectives: reachability, safety, (co-)Büchi, and other parity objectives. We aim at giving an introduction to a collection of techniques that allow for the construction of strategies with little or no memory in countably infinite MDPs.

Cite as

Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke, and Dominik Wojtczak. How to Play in Infinite MDPs (Invited Talk). In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 3:1-3:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kiefer_et_al:LIPIcs.ICALP.2020.3,
  author =	{Kiefer, Stefan and Mayr, Richard and Shirmohammadi, Mahsa and Totzke, Patrick and Wojtczak, Dominik},
  title =	{{How to Play in Infinite MDPs}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{3:1--3:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.3},
  URN =		{urn:nbn:de:0030-drops-124103},
  doi =		{10.4230/LIPIcs.ICALP.2020.3},
  annote =	{Keywords: Markov decision processes}
}
  • Refine by Author
  • 2 Wojtczak, Dominik
  • 1 Hahn, Ernst Moritz
  • 1 Kiefer, Stefan
  • 1 Mayr, Richard
  • 1 Perez, Mateo
  • Show More...

  • Refine by Classification
  • 1 Computing methodologies → Machine learning algorithms
  • 1 Mathematics of computing → Markov processes
  • 1 Mathematics of computing → Probability and statistics
  • 1 Theory of computation → Automata over infinite objects
  • 1 Theory of computation → Convergence and learning in games
  • Show More...

  • Refine by Keyword
  • 1 Markov decision processes
  • 1 Omega-regular objectives
  • 1 Reinforcement learning
  • 1 Stochastic games

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 2 2020

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail