,
Mateo Perez
,
Sven Schewe
,
Fabio Somenzi
,
Ashutosh Trivedi
,
Dominik Wojtczak
Creative Commons Attribution 3.0 Unported license
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.
@InProceedings{hahn_et_al:LIPIcs.CONCUR.2020.21,
author = {Hahn, Ernst Moritz and Perez, Mateo and Schewe, Sven and Somenzi, Fabio and Trivedi, Ashutosh and Wojtczak, Dominik},
title = {{Model-Free Reinforcement Learning for Stochastic Parity Games}},
booktitle = {31st International Conference on Concurrency Theory (CONCUR 2020)},
pages = {21:1--21:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-160-3},
ISSN = {1868-8969},
year = {2020},
volume = {171},
editor = {Konnov, Igor and Kov\'{a}cs, Laura},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2020.21},
URN = {urn:nbn:de:0030-drops-128332},
doi = {10.4230/LIPIcs.CONCUR.2020.21},
annote = {Keywords: Reinforcement learning, Stochastic games, Omega-regular objectives}
}