Model-Free Reinforcement Learning for Stochastic Parity Games

Hahn, Ernst Moritz; Perez, Mateo; Schewe, Sven; Somenzi, Fabio; Trivedi, Ashutosh; Wojtczak, Dominik

doi:10.4230/LIPIcs.CONCUR.2020.21

Abstract

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

M. Aigner and M. Fromme. A game of cops and robbers. Discrete Applied Mathematics, 8:1-12, 1984.
D. Andersson and Miltersen P. B. The complexity of solving stochastic games on graphs. In Algorithms and Computation, pages 112-121, 2009.
C. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, 2008.
V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38(2):447-469, 2000.
K. Chatterjee and N. Fijalkow. A reduction from parity games to simple stochastic games. In Games, Automata, Logics and Formal Verification, GandALF, pages 74-86, June 2011.
K. Chatterjee and T. A. Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Inf. Process. Lett., 106(1):1-7, 2008.
K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Simple stochastic parity games. In Computer Science Logic (CSL), pages 100-113, 2003.
K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Quantitative stochastic parity games. In Symposium on Discrete Algorithms, SODA, pages 121-130, 2004.
A. Condon. The complexity of stochastic games. Inf. Comput., 96(2):203-224, 1992.
C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. J. ACM, 42(4):857-907, July 1995.
L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998.
J. Fu and U. Topcu. Probably approximately correct MDP learning and control with temporal logic constraints. In Robotics: Science and Systems, July 2014.
I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
C. M. Grinstead and J. L. Snell. Introduction to Probability. Amer. Math. Soc., 1997.
A. Guez et al. An investigation of model-free planning. CoRR, abs/1901.03559, 2019. URL: http://arxiv.org/abs/1901.03559.
E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, (CONCUR), pages 354-367, 2015.
E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 395-412, 2019. LNCS 11427.
E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 306-323, 2020. LNCS 12078.
E. M. Hahn, S. Schewe, A. Turrini, and L. Zhang. A simple algorithm for solving qualitative probabilistic parity games. In Computer Aided Verification, Part II, pages 291-311, 2016. LNCS 9780.
A. Harding, M. Ryan, and P.-Y. Schobbens. A new algorithm for strategy synthesis in LTL games. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2005), pages 477-492, Edinburgh, UK, 2005. LNCS 3440.
T. A. Henzinger and N. Piterman. Solving games without determinization. In 15th Conference on Computer Science Logic, pages 394-409, Szeged, Hungary, September 2006. LNCS 4207.
Alex Irpan. Deep reinforcement learning doesn't work yet. https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018.
J. G. Kemeny and J. L. Snell. Finite Markov Chains. Van Nostrand, 1960.
M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Computer Aided Verification (CAV), pages 585-591, July 2011. LNCS 6806.
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 157-163, 1994.
M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In International Conference on Machine Learning, pages 310-318, 1996.
Stochastic parity game reinforcement learning benchmarks. https://github.com/cuplv/parityRLBenchmarks, 2020.
W. Penney. Problem 95. Penney-ante. Journal of Recreational Mathematics, 2(4):241, 1969.
D. Perrin and J.-É. Pin. Infinite Words: Automata, Semigroups, Logic and Games. Elsevier, 2004.
L. S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. U.S.A., 39:1095-1100, 1953.
S. Sickert, J. Esparza, S. Jaax, and J. Křetínský. Limit-deterministic Büchi automata for linear temporal logic. In Computer Aided Verification (CAV), pages 312-332, 2016. LNCS 9780.
D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484-489, January 2016.
A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In International Conference on Machine Learning, ICML, pages 881-888, 2006.
R. S. Sutton and A. G. Barto. Reinforcement Learnging: An Introduction. MIT Press, second edition, 2018.
M. Y. Vardi. Automatic verification of probabilistic concurrent finite state programs. In Foundations of Computer Science, pages 327-338, 1985.
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. In Machine Learning, pages 279-292, 1992.
M. Wen and U. Topcu. Probably approximately correct learning in stochastic games with temporal logic specifications. In IJCAI, pages 3630-3636, 2016.
E. Wiewiora. Reward shaping. In Encyclopedia of Machine Learning, pages 863-865. Springer, 2010.
P. Winkler. Mathematical Puzzles: A Connoisseur’s Collection. A K Peters, 2004.

Model-Free Reinforcement Learning for Stochastic Parity Games

Authors Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Model-Free Reinforcement Learning for Stochastic Parity Games

Authors Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message