A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs

Authors Marnix Suilen , Marck van der Vegt , Sebastian Junges



PDF
Thumbnail PDF

File

LIPIcs.CONCUR.2024.40.pdf
  • Filesize: 0.87 MB
  • 17 pages

Document Identifiers

Author Details

Marnix Suilen
  • Radboud University, Nijmegen, The Netherlands
Marck van der Vegt
  • Radboud University, Nijmegen, The Netherlands
Sebastian Junges
  • Radboud University, Nijmegen, The Netherlands

Cite AsGet BibTex

Marnix Suilen, Marck van der Vegt, and Sebastian Junges. A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs. In 35th International Conference on Concurrency Theory (CONCUR 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 311, pp. 40:1-40:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CONCUR.2024.40

Abstract

Markov Decision Processes (MDPs) model systems with uncertain transition dynamics. Multiple-environment MDPs (MEMDPs) extend MDPs. They intuitively reflect finite sets of MDPs that share the same state and action spaces but differ in the transition dynamics. The key objective in MEMDPs is to find a single strategy that satisfies a given objective in every associated MDP. The main result of this paper is PSPACE-completeness for almost-sure Rabin objectives in MEMDPs. This result clarifies the complexity landscape for MEMDPs and contrasts with results for the more general class of partially observable MDPs (POMDPs), where almost-sure reachability is already EXP-complete, and almost-sure Rabin objectives are undecidable.

Subject Classification

ACM Subject Classification
  • Theory of computation → Logic and verification
Keywords
  • Markov Decision Processes
  • partial observability
  • linear-time Objectives

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. In AAAI, pages 2669-2678. AAAI Press, 2018. Google Scholar
  2. Roman Andriushchenko, Milan Ceska, Sebastian Junges, Joost-Pieter Katoen, and Simon Stupinský. PAYNT: A tool for inductive synthesis of probabilistic programs. In CAV (1), volume 12759 of LNCS, pages 856-869. Springer, 2021. Google Scholar
  3. Sebastian Arming, Ezio Bartocci, Krishnendu Chatterjee, Joost-Pieter Katoen, and Ana Sokolova. Parameter-independent strategies for pMDPs via POMDPs. In QEST, volume 11024 of LNCS, pages 53-70. Springer, 2018. Google Scholar
  4. Christel Baier, Nathalie Bertrand, and Marcus Größer. On decision problems for probabilistic Büchi automata. In FoSSaCS, volume 4962 of LNCS, pages 287-301. Springer, 2008. Google Scholar
  5. Christel Baier, Marcus Größer, and Nathalie Bertrand. Probabilistic ω-automata. J. ACM, 59(1):1:1-1:52, 2012. Google Scholar
  6. Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008. Google Scholar
  7. Nathalie Bertrand, Patricia Bouyer, and Anirban Majumdar. Concurrent parameterized games. In FSTTCS, volume 150 of LIPIcs, pages 31:1-31:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  8. Dietmar Berwanger and Laurent Doyen. On the power of imperfect information. In FSTTCS, volume 2 of LIPIcs, pages 73-82. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2008. Google Scholar
  9. Iadine Chades, Josie Carwardine, Tara G. Martin, Samuel Nicol, Régis Sabbadin, and Olivier Buffet. MoMDPs: A solution for modelling adaptive management problems. In AAAI. AAAI Press, 2012. Google Scholar
  10. Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. Optimal cost almost-sure reachability in POMDPs. Artif. Intell., 234:26-48, 2016. Google Scholar
  11. Krishnendu Chatterjee, Martin Chmelík, Deep Karkhanis, Petr Novotný, and Amélie Royer. Multiple-environment Markov decision processes: Efficient analysis and applications. In ICAPS, pages 48-56. AAAI Press, 2020. Google Scholar
  12. Krishnendu Chatterjee, Martin Chmelík, and Mathieu Tracol. What is decidable about partially observable Markov decision processes with ω-regular objectives. J. Comput. Syst. Sci., 82(5):878-911, 2016. Google Scholar
  13. Krishnendu Chatterjee, Luca de Alfaro, and Thomas A. Henzinger. The complexity of stochastic Rabin and Streett games'. In ICALP, volume 3580 of LNCS, pages 878-890. Springer, 2005. Google Scholar
  14. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Qualitative analysis of partially-observable Markov decision processes. In MFCS, volume 6281 of LNCS, pages 258-269. Springer, 2010. Google Scholar
  15. Krishnendu Chatterjee, Marcin Jurdzinski, and Thomas A. Henzinger. Quantitative stochastic parity games. In SODA, pages 121-130. SIAM, 2004. Google Scholar
  16. Krishnendu Chatterjee and Mathieu Tracol. Decidable problems for probabilistic automata on infinite words. In LICS, pages 185-194. IEEE Computer Society, 2012. Google Scholar
  17. Min Chen, Emilio Frazzoli, David Hsu, and Wee Sun Lee. POMDP-lite for robust robot planning under uncertainty. In ICRA, pages 5427-5433. IEEE, 2016. Google Scholar
  18. Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. Convex optimization for parameter synthesis in MDPs. IEEE Trans. Autom. Control., 67(12):6333-6348, 2022. Google Scholar
  19. Luca de Alfaro. Formal verification of probabilistic systems. PhD thesis, Stanford University, USA, 1997. Google Scholar
  20. Nathanaël Fijalkow. What is known about the value 1 problem for probabilistic automata? CoRR, abs/1410.3770, 2014. Google Scholar
  21. Nathanaël Fijalkow, Nathalie Bertrand, Patricia Bouyer-Decitre, Romain Brenguier, Arnaud Carayol, John Fearnley, Hugo Gimbert, Florian Horn, Rasmus Ibsen-Jensen, Nicolas Markey, Benjamin Monmege, Petr Novotný, Mickael Randour, Ocan Sankur, Sylvain Schmitz, Olivier Serre, and Mateusz Skomra. Games on graphs. CoRR, abs/2305.10546, 2023. Google Scholar
  22. Nathanaël Fijalkow, Hugo Gimbert, Edon Kelmendi, and Youssouf Oualhadj. Deciding the value 1 problem for probabilistic leaktight automata. Log. Methods Comput. Sci., 11(2), 2015. Google Scholar
  23. Nathanaël Fijalkow, Hugo Gimbert, and Youssouf Oualhadj. Deciding the value 1 problem for probabilistic leaktight automata. In LICS, pages 295-304. IEEE Computer Society, 2012. Google Scholar
  24. Vojtech Forejt, Marta Z. Kwiatkowska, Gethin Norman, and David Parker. Automated verification techniques for probabilistic systems. In SFM, volume 6659 of LNCS, pages 53-113. Springer, 2011. Google Scholar
  25. Javier García and Fernando Fernández. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., 16:1437-1480, 2015. Google Scholar
  26. Hugo Gimbert and Youssouf Oualhadj. Probabilistic automata on finite words: Decidable and undecidable problems. In ICALP (2), volume 6199 of LNCS, pages 527-538. Springer, 2010. Google Scholar
  27. Garud N. Iyengar. Robust dynamic programming. Math. Oper. Res., 30(2):257-280, 2005. Google Scholar
  28. Nils Jansen, Sebastian Junges, and Joost-Pieter Katoen. Parameter synthesis in Markov models: A gentle survey. In Principles of Systems Design, volume 13660 of LNCS, pages 407-437. Springer, 2022. Google Scholar
  29. Sebastian Junges, Nils Jansen, and Sanjit A. Seshia. Enforcing almost-sure reachability in POMDPs. In CAV (2), volume 12760 of LNCS, pages 602-625. Springer, 2021. Google Scholar
  30. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artif. Intell., 101(1-2):99-134, 1998. Google Scholar
  31. Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra. Solving POMDPs by searching the space of finite policies. In UAI, pages 417-426. Morgan Kaufmann, 1999. Google Scholar
  32. Arnab Nilim and Laurent El Ghaoui. Robust control of Markov decision processes with uncertain transition matrices. Oper. Res., 53(5):780-798, 2005. Google Scholar
  33. Christos H. Papadimitriou and John N. Tsitsiklis. The complexity of Markov decision processes. Math. Oper. Res., 12(3):441-450, 1987. Google Scholar
  34. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994. Google Scholar
  35. Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Algorithms for omega-regular games with imperfect information. Log. Methods Comput. Sci., 3(3), 2007. Google Scholar
  36. Jean-François Raskin and Ocan Sankur. Multiple-environment Markov decision processes. In FSTTCS, volume 29 of LIPIcs, pages 531-543. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. Google Scholar
  37. Marc Rigter, Bruno Lacerda, and Nick Hawes. Minimax regret optimisation for robust planning in uncertain Markov decision processes. In AAAI, pages 11930-11938. AAAI Press, 2021. Google Scholar
  38. Marnix Suilen, Marck van der Vegt, and Sebastian Junges. A PSPACE algorithm for almost-sure Rabin objectives in multi-environment MDPs, 2024. URL: https://arxiv.org/abs/2407.07006.
  39. Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an introduction. Adaptive computation and machine learning. MIT Press, 1998. Google Scholar
  40. Marck van der Vegt, Nils Jansen, and Sebastian Junges. Robust almost-sure reachability in multi-environment MDPs. In TACAS (1), volume 13993 of LNCS, pages 508-526. Springer, 2023. Google Scholar
  41. Tobias Winkler, Sebastian Junges, Guillermo A. Pérez, and Joost-Pieter Katoen. On the complexity of reachability in parametric Markov decision processes. In CONCUR, volume 140 of LIPIcs, pages 14:1-14:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar