Finite-Memory Strategies for Almost-Sure Energy-MeanPayoff Objectives in MDPs

Authors Mohan Dantam, Richard Mayr



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2024.133.pdf
  • Filesize: 0.8 MB
  • 17 pages

Document Identifiers

Author Details

Mohan Dantam
  • School of Informatics, University of Edinburgh, UK
Richard Mayr
  • School of Informatics, University of Edinburgh, UK

Cite AsGet BibTex

Mohan Dantam and Richard Mayr. Finite-Memory Strategies for Almost-Sure Energy-MeanPayoff Objectives in MDPs. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 133:1-133:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.133

Abstract

We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.

Subject Classification

ACM Subject Classification
  • Theory of computation → Random walks and Markov chains
  • Mathematics of computing → Probability and statistics
Keywords
  • Markov decision processes
  • energy
  • mean payoff
  • parity
  • strategy complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pieter Abbeel and Andrew Y. Ng. Learning first-order Markov models for control. In Advances in Neural Information Processing Systems 17, pages 1-8. MIT Press, 2004. Google Scholar
  2. Galit Ashkenazi-Golan, János Flesch, Arkadi Predtetchinski, and Eilon Solan. Reachability and safety objectives in Markov decision processes on long but finite horizons. Journal of Optimization Theory and Applications, 185:945-965, 2020. Google Scholar
  3. Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, 2008. Google Scholar
  4. Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008. Google Scholar
  5. Vincent D. Blondel and John N. Tsitsiklis. A survey of computational complexity results in systems and control. Automatica, 36(9):1249-1274, 2000. Google Scholar
  6. T. Brázdil, A. Kučera, and P. Novotný. Optimizing the expected mean payoff in energy Markov decision processes. In International Symposium on Automated Technology for Verification and Analysis (ATVA), volume 9938 of LNCS, pages 32-49, 2016. Google Scholar
  7. Tomáš Brázdil, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science, 10, 2014. Google Scholar
  8. Véronique Bruyère, Quentin Hautem, Mickael Randour, and Jean-François Raskin. Energy Mean-Payoff Games. In Wan Fokkink and Rob van Glabbeek, editors, 30th International Conference on Concurrency Theory (CONCUR 2019), volume 140 of Leibniz International Proceedings in Informatics (LIPIcs), pages 21:1-21:17, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: http://dx.doi.org/10.4230/LIPIcs.CONCUR.2019.21.
  9. Nicole Bäuerle and Ulrich Rieder. Markov Decision Processes with Applications to Finance. Springer-Verlag Berlin Heidelberg, 2011. Google Scholar
  10. Arindam Chakrabarti, Luca De Alfaro, Thomas A. Henzinger, and Mariëlle Stoelinga. Resource interfaces. In International Workshop on Embedded Software, pages 117-133, 2003. Google Scholar
  11. K. Chatterjee and T. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394-413, 2012. Google Scholar
  12. Krishnendu Chatterjee and Laurent Doyen. Energy and mean-payoff parity Markov decision processes. In International Symposium on Mathematical Foundations of Computer Science (MFCS), volume 6907, pages 206-218, 2011. Google Scholar
  13. Krishnendu Chatterjee, Thomas A. Henzinger, and Marcin Jurdziński. Mean-payoff parity games. In Logic in Computer Science (LICS), pages 178-187, 2005. Google Scholar
  14. Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors. Handbook of Model Checking. Springer, 2018. URL: http://dx.doi.org/10.1007/978-3-319-10575-8.
  15. E.M. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, December 1999. Google Scholar
  16. Lorenzo Clemente and Jean-Francois Raskin. Multidimensional beyond worst-case and almost-sure problems for mean-payoff objectives. In Proceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 257-268, 2015. Google Scholar
  17. Mohan Dantam and Richard Mayr. Approximating the value of energy-parity objectives in simple stochastic games. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), pages 38:1-38:15, 2023. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2023.38.
  18. Mohan Dantam and Richard Mayr. Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs, 2024. URL: http://arxiv.org/abs/2404.14522.
  19. Luca De Alfaro. Formal verification of probabilistic systems. PhD thesis, Stanford University, 1997. Google Scholar
  20. János Flesch, Arkadi Predtetchinski, and William Sudderth. Simplifying optimal strategies in limsup and liminf stochastic games. Discrete Applied Mathematics, 251:40-56, 2018. Google Scholar
  21. Dean Gillette. Stochastic games with zero stop probabilities. Contributions to the Theory of Games, 3:179-187, 1957. Google Scholar
  22. Hugo Gimbert, Youssouf Oualhadj, and Soumya Paul. Computing optimal strategies for Markov decision processes with parity and positive-average conditions. Working paper or preprint, 2011. Google Scholar
  23. T.P. Hill and V.C. Pestien. The existence of good Markov strategies for decision processes with general payoffs. Stoch. Processes and Appl., 24:61-76, 1987. Google Scholar
  24. M. Jurdziński. Deciding the winner in parity games is in UP ∩ co-UP. Information Processing Letters, 68:119-124, 1998. Google Scholar
  25. Richard Mayr, Sven Schewe, Patrick Totzke, and Dominik Wojtczak. MDPs with Energy-Parity Objectives. In Logic in Computer Science (LICS). IEEE, 2017. Google Scholar
  26. Richard Mayr, Sven Schewe, Patrick Totzke, and Dominik Wojtczak. Simple stochastic games with almost-sure energy-parity objectives are in NP and coNP. In Proc. of Fossacs, volume 12650 of LNCS, 2021. Extended version on arXiv. URL: http://arxiv.org/abs/2101.06989.
  27. A. Puri. Theory of hybrid systems and discrete event structures. PhD thesis, University of California, Berkeley, 1995. Google Scholar
  28. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994. Google Scholar
  29. Manfred Schäl. Markov decision processes in finance and dynamic options. In Handbook of Markov Decision Processes, pages 461-487. Springer, 2002. Google Scholar
  30. Olivier Sigaud and Olivier Buffet. Markov Decision Processes in Artificial Intelligence. John Wiley & Sons, 2013. Google Scholar
  31. William D. Sudderth. Optimal Markov strategies. Decisions in Economics and Finance, 43:43-54, 2020. Google Scholar
  32. R.S. Sutton and A.G Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, 2018. Google Scholar