Model-Free Reinforcement Learning for Stochastic Parity Games

Authors Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

Thumbnail PDF


  • Filesize: 0.55 MB
  • 16 pages

Document Identifiers

Author Details

Ernst Moritz Hahn
  • University of Twente, Enschede, The Netherlands
Mateo Perez
  • University of Colorado Boulder, CO, USA
Sven Schewe
  • University of Liverpool, UK
Fabio Somenzi
  • University of Colorado Boulder, CO, USA
Ashutosh Trivedi
  • University of Colorado Boulder, CO, USA
Dominik Wojtczak
  • University of Liverpool, UK

Cite AsGet BibTex

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-Free Reinforcement Learning for Stochastic Parity Games. In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

Subject Classification

ACM Subject Classification
  • Theory of computation → Automata over infinite objects
  • Computing methodologies → Machine learning algorithms
  • Mathematics of computing → Markov processes
  • Theory of computation → Convergence and learning in games
  • Reinforcement learning
  • Stochastic games
  • Omega-regular objectives


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. M. Aigner and M. Fromme. A game of cops and robbers. Discrete Applied Mathematics, 8:1-12, 1984. Google Scholar
  2. D. Andersson and Miltersen P. B. The complexity of solving stochastic games on graphs. In Algorithms and Computation, pages 112-121, 2009. Google Scholar
  3. C. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, 2008. Google Scholar
  4. V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38(2):447-469, 2000. Google Scholar
  5. K. Chatterjee and N. Fijalkow. A reduction from parity games to simple stochastic games. In Games, Automata, Logics and Formal Verification, GandALF, pages 74-86, June 2011. Google Scholar
  6. K. Chatterjee and T. A. Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Inf. Process. Lett., 106(1):1-7, 2008. Google Scholar
  7. K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Simple stochastic parity games. In Computer Science Logic (CSL), pages 100-113, 2003. Google Scholar
  8. K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Quantitative stochastic parity games. In Symposium on Discrete Algorithms, SODA, pages 121-130, 2004. Google Scholar
  9. A. Condon. The complexity of stochastic games. Inf. Comput., 96(2):203-224, 1992. Google Scholar
  10. C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. J. ACM, 42(4):857-907, July 1995. Google Scholar
  11. L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998. Google Scholar
  12. J. Fu and U. Topcu. Probably approximately correct MDP learning and control with temporal logic constraints. In Robotics: Science and Systems, July 2014. Google Scholar
  13. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. Google Scholar
  14. C. M. Grinstead and J. L. Snell. Introduction to Probability. Amer. Math. Soc., 1997. Google Scholar
  15. A. Guez et al. An investigation of model-free planning. CoRR, abs/1901.03559, 2019. URL:
  16. E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, (CONCUR), pages 354-367, 2015. Google Scholar
  17. E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 395-412, 2019. LNCS 11427. Google Scholar
  18. E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 306-323, 2020. LNCS 12078. Google Scholar
  19. E. M. Hahn, S. Schewe, A. Turrini, and L. Zhang. A simple algorithm for solving qualitative probabilistic parity games. In Computer Aided Verification, Part II, pages 291-311, 2016. LNCS 9780. Google Scholar
  20. A. Harding, M. Ryan, and P.-Y. Schobbens. A new algorithm for strategy synthesis in LTL games. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2005), pages 477-492, Edinburgh, UK, 2005. LNCS 3440. Google Scholar
  21. T. A. Henzinger and N. Piterman. Solving games without determinization. In 15th Conference on Computer Science Logic, pages 394-409, Szeged, Hungary, September 2006. LNCS 4207. Google Scholar
  22. Alex Irpan. Deep reinforcement learning doesn't work yet., 2018.
  23. J. G. Kemeny and J. L. Snell. Finite Markov Chains. Van Nostrand, 1960. Google Scholar
  24. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Computer Aided Verification (CAV), pages 585-591, July 2011. LNCS 6806. Google Scholar
  25. M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 157-163, 1994. Google Scholar
  26. M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In International Conference on Machine Learning, pages 310-318, 1996. Google Scholar
  27. Stochastic parity game reinforcement learning benchmarks., 2020.
  28. W. Penney. Problem 95. Penney-ante. Journal of Recreational Mathematics, 2(4):241, 1969. Google Scholar
  29. D. Perrin and J.-É. Pin. Infinite Words: Automata, Semigroups, Logic and Games. Elsevier, 2004. Google Scholar
  30. L. S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. U.S.A., 39:1095-1100, 1953. Google Scholar
  31. S. Sickert, J. Esparza, S. Jaax, and J. Křetínský. Limit-deterministic Büchi automata for linear temporal logic. In Computer Aided Verification (CAV), pages 312-332, 2016. LNCS 9780. Google Scholar
  32. D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484-489, January 2016. Google Scholar
  33. A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In International Conference on Machine Learning, ICML, pages 881-888, 2006. Google Scholar
  34. R. S. Sutton and A. G. Barto. Reinforcement Learnging: An Introduction. MIT Press, second edition, 2018. Google Scholar
  35. M. Y. Vardi. Automatic verification of probabilistic concurrent finite state programs. In Foundations of Computer Science, pages 327-338, 1985. Google Scholar
  36. Christopher J. C. H. Watkins and Peter Dayan. Q-learning. In Machine Learning, pages 279-292, 1992. Google Scholar
  37. M. Wen and U. Topcu. Probably approximately correct learning in stochastic games with temporal logic specifications. In IJCAI, pages 3630-3636, 2016. Google Scholar
  38. E. Wiewiora. Reward shaping. In Encyclopedia of Machine Learning, pages 863-865. Springer, 2010. Google Scholar
  39. P. Winkler. Mathematical Puzzles: A Connoisseur’s Collection. A K Peters, 2004. Google Scholar