Monte Carlo Tree Search Guided by Symbolic Advice for MDPs

Authors Damien Busatto-Gaston , Debraj Chakraborty , Jean-Francois Raskin



PDF
Thumbnail PDF

File

LIPIcs.CONCUR.2020.40.pdf
  • Filesize: 0.65 MB
  • 24 pages

Document Identifiers

Author Details

Damien Busatto-Gaston
  • Université Libre de Bruxelles, Brussels, Belgium
Debraj Chakraborty
  • Université Libre de Bruxelles, Brussels, Belgium
Jean-Francois Raskin
  • Université Libre de Bruxelles, Brussels, Belgium

Acknowledgements

Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11.

Cite AsGet BibTex

Damien Busatto-Gaston, Debraj Chakraborty, and Jean-Francois Raskin. Monte Carlo Tree Search Guided by Symbolic Advice for MDPs. In 31st International Conference on Concurrency Theory (CONCUR 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 171, pp. 40:1-40:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CONCUR.2020.40

Abstract

In this paper, we consider the online computation of a strategy that aims at optimizing the expected average reward in a Markov decision process. The strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS). We augment the MCTS algorithm with the notion of symbolic advice, and show that its classical theoretical guarantees are maintained. Symbolic advice are used to bias the selection and simulation strategies of MCTS. We describe how to use QBF and SAT solvers to implement symbolic advice in an efficient way. We illustrate our new algorithm using the popular game Pac-Man and show that the performances of our algorithm exceed those of plain MCTS as well as the performances of human players.

Subject Classification

ACM Subject Classification
  • Theory of computation → Theory of randomized search heuristics
  • Theory of computation → Convergence and learning in games
Keywords
  • Markov decision process
  • Monte Carlo tree search
  • symbolic advice
  • simulation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI 2018), pages 2669-2678. AAAI Press, 2018. Google Scholar
  2. Pranav Ashok, Tomás Brázdil, Jan Kretínský, and Ondrej Slámecka. Monte Carlo tree search for verifying reachability in Markov decision processes. In Proceedings of the 8th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2018), volume 11245 of Lecture Notes in Computer Science, pages 322-335. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-03421-4_21.
  3. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002. URL: https://doi.org/10.1023/A:1013689704352.
  4. Hendrik Baier and Mark H. M. Winands. MCTS-minimax hybrids. IEEE Transactions on Computational Intelligence and AI in Games, 7(2):167-179, 2015. URL: https://doi.org/10.1109/TCIAIG.2014.2366555.
  5. Armin Biere, Keijo Heljanko, Tommi Junttila, Timo Latvala, and Viktor Schuppan. Linear encodings of bounded LTL model checking. Logical Methods in Computer Science, Volume 2, Issue 5, November 2006. URL: https://doi.org/10.2168/LMCS-2(5:5)2006.
  6. Tomás Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtech Forejt, Jan Kretínský, Marta Z. Kwiatkowska, David Parker, and Mateusz Ujma. Verification of Markov decision processes using learning algorithms. In Proceedings of the 12th International Symposium on Automated Technology for Verification and Analysis (ATVA 2014), volume 8837 of Lecture Notes in Computer Science, pages 98-114. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-11936-6_8.
  7. Cameron Browne, Edward Jack Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez Liebana, Spyridon Samothrakis, and Simon Colton. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1-43, 2012. URL: https://doi.org/10.1109/TCIAIG.2012.2186810.
  8. Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi. Distribution-aware sampling and weighted model counting for SAT. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2014 (AAAI 2014), pages 1722-1730. AAAI Press, 2014. URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8364.
  9. Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi. On parallel scalable uniform SAT witness generation. In Proceedings of the 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2015), Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS 2015), volume 9035 of Lecture Notes in Computer Science, pages 304-319. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-46681-0_25.
  10. Guillaume M. J. B. Chaslot, Mark H. M. Winands, and H. Jaap van den Herik. Parallel Monte-Carlo tree search. In H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, and Mark H. M. Winands, editors, Proceedings of the 6th International Conference on Computers and Games (CG 2008), volume 5131 of Lecture Notes in Computer Science, pages 60-71. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-87608-3_6.
  11. Krishnendu Chatterjee, Petr Novotný, Guillermo A. Pérez, Jean-François Raskin, and Dorde Zikelic. Optimizing expectation with guarantees in POMDPs. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), pages 3725-3732. AAAI Press, 2017. Google Scholar
  12. Przemyslaw Daca, Thomas A. Henzinger, Jan Kretínský, and Tatjana Petrov. Faster statistical model checking for unbounded temporal properties. In Proceedings of the 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS 2016), volume 9636 of Lecture Notes in Computer Science, pages 112-129. Springer, 2016. URL: https://doi.org/10.1007/978-3-662-49674-9_7.
  13. John DeNero and Dan Klein. CS 188 : Introduction to artificial intelligence. URL: https://inst.eecs.berkeley.edu/~cs188.
  14. Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. Cautious reinforcement learning with logical constraints. In Amal El Fallah Seghrouchni, Gita Sukthankar, Bo An, and Neil Yorke-Smith, editors, Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, (AAMAS 2020), pages 483-491. International Foundation for Autonomous Agents and Multiagent Systems, 2020. URL: https://dl.acm.org/doi/abs/10.5555/3398761.3398821.
  15. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13-30, 1963. URL: https://doi.org/10.1080/01621459.1963.10500830.
  16. Jing Huang, Zhiqing Liu, Benjie Lu, and Feng Xiao. Pruning in UCT algorithm. In Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence (TAAI 2010), pages 177-181, 2010. URL: https://doi.org/10.1109/TAAI.2010.38.
  17. Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49(2-3):193-208, 2002. URL: https://doi.org/10.1023/A:1017932429737.
  18. Levente Kocsis and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML 2006), volume 4212 of Lecture Notes in Computer Science, pages 282-293. Springer, 2006. URL: https://doi.org/10.1007/11871842_29.
  19. Jan Kretínský, Guillermo A. Pérez, and Jean-François Raskin. Learning-based mean-payoff optimization in an unknown MDP under omega-regular constraints. In Proceedings of the 29th International Conference on Concurrency Theory (CONCUR 2018), volume 118 of Leibniz International Proceedings in Informatics, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  20. Christof Löding. Infinite games and automata theory. In Krzysztof R. Apt and Erich Grädel, editors, Lectures in Game Theory for Computer Scientists, pages 38-73. Cambridge University Press, 2011. Google Scholar
  21. João Marques-Silva and Sharad Malik. Propositional SAT solving. In Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model Checking, pages 247-275. Springer, 2018. Google Scholar
  22. Nina Narodytska, Alexander Legg, Fahiem Bacchus, Leonid Ryzhyk, and Adam Walker. Solving games without controllable predecessor. In Proceedings of the 26th International Conference on Computer Aided Verification (CAV 2014), volume 8559 of Lecture Notes in Computer Science, pages 533-540. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-08867-9_35.
  23. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994. URL: https://doi.org/10.1002/9780470316887.
  24. Ankit Shukla, Armin Biere, Luca Pulina, and Martina Seidl. A survey on applications of quantified Boolean formulas. In Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2019), pages 78-84. IEEE, 2019. URL: https://doi.org/10.1109/ICTAI.2019.00020.
  25. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016. URL: https://doi.org/10.1038/nature16961.
  26. David Silver and Gerald Tesauro. Monte-Carlo simulation balancing. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 945-952, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail