Demonic Variance and a Non-Determinism Score for Markov Decision Processes

Author Jakob Piribauer



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2024.79.pdf
  • Filesize: 0.77 MB
  • 15 pages

Document Identifiers

Author Details

Jakob Piribauer
  • Technische Universität Dresden, Germany
  • Universität Leipzig, Germany

Cite AsGet BibTex

Jakob Piribauer. Demonic Variance and a Non-Determinism Score for Markov Decision Processes. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 79:1-79:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.MFCS.2024.79

Abstract

This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP ℳ, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of ℳ in which also the non-deterministic choices are resolved independently by two distinct schedulers. It is shown that the demonic variance is between 1 and 2 times as large as the maximal variance of X in ℳ that can be achieved by a single scheduler. This allows defining a non-determinism score for ℳ and X measuring how strongly the difference of X in two executions of ℳ can be influenced by the non-deterministic choices. Properties of MDPs ℳ with extremal values of the non-determinism score are established. Further, the algorithmic problems of computing the maximal variance and the demonic variance are investigated for two random variables, namely weighted reachability and accumulated rewards. In the process, also the structure of schedulers maximizing the variance and of scheduler pairs realizing the demonic variance is analyzed.

Subject Classification

ACM Subject Classification
  • Theory of computation → Logic and verification
Keywords
  • Markov decision processes
  • variance
  • non-determinism
  • probabilism

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Erika Ábrahám and Borzoo Bonakdarpour. Hyperpctl: A temporal logic for probabilistic hyperproperties. In International Conference on Quantitative Evaluation of Systems, pages 20-35. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-99154-2_2.
  2. Christel Baier, Krishnendu Chatterjee, Tobias Meggendorfer, and Jakob Piribauer. Entropic risk for turn-based stochastic games. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023, August 28 to September 1, 2023, Bordeaux, France, volume 272 of LIPIcs, pages 15:1-15:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.MFCS.2023.15.
  3. Christel Baier, Marcus Daum, Clemens Dubslaff, Joachim Klein, and Sascha Klüppelholz. Energy-utility quantiles. In Julia M. Badger and Kristin Yvonne Rozier, editors, NASA Formal Methods - 6th International Symposium, NFM 2014, Houston, TX, USA, April 29 - May 1, 2014. Proceedings, volume 8430 of Lecture Notes in Computer Science, pages 285-299. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-06200-6_24.
  4. Christel Baier, Clemens Dubslaff, Florian Funke, Simon Jantsch, Rupak Majumdar, Jakob Piribauer, and Robin Ziemek. From verification to causality-based explications (invited talk). In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, July 12-16, 2021, Glasgow, Scotland (Virtual Conference), volume 198 of LIPIcs, pages 1:1-1:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPICS.ICALP.2021.1.
  5. Christel Baier, Florian Funke, and Rupak Majumdar. A game-theoretic account of responsibility allocation. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI, pages 1773-1779. ijcai.org, 2021. URL: https://doi.org/10.24963/IJCAI.2021/244.
  6. Christel Baier, Florian Funke, and Rupak Majumdar. Responsibility attribution in parameterized markovian models. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 11734-11743. AAAI Press, 2021. URL: https://doi.org/10.1609/aaai.v35i13.17395.
  7. Dimitri P. Bertsekas and John N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580-595, 1991. URL: https://doi.org/10.1287/moor.16.3.580.
  8. Tomáš Brázdil, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science, 10, 2014. URL: https://doi.org/10.2168/LMCS-10(1:13)2014.
  9. Tomáš Brázdil, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. Trading performance for stability in Markov decision processes. Journal of Computer and System Sciences, 84:144-170, 2017. URL: https://doi.org/10.1016/j.jcss.2016.09.009.
  10. Hana Chockler and Joseph Y. Halpern. Responsibility and Blame: A Structural-Model Approach. J. Artif. Int. Res., 22(1):93-115, October 2004. URL: https://doi.org/10.1613/jair.1391.
  11. Michael R Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K Micinski, Markus N Rabe, and César Sánchez. Temporal logics for hyperproperties. In Principles of Security and Trust: Third International Conference, POST, pages 265-284. Springer, 2014. URL: https://doi.org/10.1007/978-3-642-54792-8_15.
  12. Michael R Clarkson and Fred B Schneider. Hyperproperties. Journal of Computer Security, 18(6):1157-1210, 2010. URL: https://doi.org/10.3233/JCS-2009-0393.
  13. EJ Collins. Finite-horizon variance penalised Markov decision processes. Operations-Research-Spektrum, 19(1):35-39, 1997. Google Scholar
  14. Luca de Alfaro. Computing minimum and maximum reachability times in probabilistic systems. In 10th International Conference on Concurrency Theory (CONCUR), volume 1664 of Lecture Notes in Computer Science, pages 66-81, 1999. URL: https://doi.org/10.1007/3-540-48320-9_7.
  15. Rayna Dimitrova, Bernd Finkbeiner, and Hazem Torfah. Probabilistic hyperproperties of Markov decision processes. In International Symposium on Automated Technology for Verification and Analysis, ATVA, pages 484-500. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-59152-6_27.
  16. Jerzy A Filar, Lodewijk CM Kallenberg, and Huey-Miin Lee. Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147-161, 1989. URL: https://doi.org/10.1287/moor.14.1.147.
  17. Christoph Haase and Stefan Kiefer. The odds of staying on budget. In 42nd International Colloquium on Automata, Languages, and Programming (ICALP), volume 9135 of Lecture Notes in Computer Science, pages 234-246. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-47666-6_19.
  18. Lodewijk Kallenberg. Markov Decision Processes. Lecture Notes. University of Leiden, 2016. Google Scholar
  19. Scott Kolodziej, Pedro M Castro, and Ignacio E Grossmann. Global optimization of bilinear programs with a multiparametric disaggregation technique. Journal of Global Optimization, 57:1039-1063, 2013. URL: https://doi.org/10.1007/s10898-012-0022-1.
  20. Jan Kretínský and Tobias Meggendorfer. Conditional value-at-risk for reachability and mean payoff in Markov decision processes. In 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 609-618. ACM, 2018. URL: https://doi.org/10.1145/3209108.3209176.
  21. Pawel Ladosz, Lilian Weng, Minwoo Kim, and Hyondong Oh. Exploration in deep reinforcement learning: A survey. Inf. Fusion, 85(C):1-22, September 2022. URL: https://doi.org/10.1016/j.inffus.2022.03.003.
  22. Petr Mandl. On the variance in controlled Markov chains. Kybernetika, 7(1):1-12, 1971. URL: http://www.kybernetika.cz/content/1971/1/1.
  23. Olvi L Mangasarian. The linear complementarity problem as a separable bilinear program. Journal of Global Optimization, 6(2):153-161, 1995. URL: https://doi.org/10.1007/BF01096765.
  24. Shie Mannor and John N. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proceedings of the 28th International Conference on Machine Learning, ICML'11, pages 177-184, Madison, WI, USA, 2011. Omnipress. URL: https://icml.cc/2011/papers/156_icmlpaper.pdf.
  25. Corto Mascle, Christel Baier, Florian Funke, Simon Jantsch, and Stefan Kiefer. Responsibility and verification: Importance value in temporal logics. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS, pages 1-14. IEEE, 2021. URL: https://doi.org/10.1109/LICS52264.2021.9470597.
  26. Jakob Piribauer. Demonic variance and a non-determinism score for Markov decision processes, 2024. URL: https://doi.org/10.48550/arXiv.2406.18727.
  27. Jakob Piribauer and Christel Baier. On Skolem-hardness and saturation points in Markov decision processes. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP), volume 168 of LIPIcs, pages 138:1-138:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.138.
  28. Jakob Piribauer, Ocan Sankur, and Christel Baier. The variance-penalized stochastic shortest path problem. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 129:1-129:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPICS.ICALP.2022.129.
  29. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 1994. URL: https://doi.org/10.1002/9780470316887.
  30. Mickael Randour, Jean-François Raskin, and Ocan Sankur. Percentile queries in multi-dimensional markov decision processes. Formal Methods Syst. Des., 50(2-3):207-248, 2017. URL: https://doi.org/10.1007/s10703-016-0262-7.
  31. Michael Ummels and Christel Baier. Computing quantiles in Markov reward models. In Frank Pfenning, editor, 16th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), volume 7794 of Lecture Notes in Computer Science, pages 353-368. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-37075-5_23.
  32. Tom Verhoeff. Reward variance in Markov chains: A calculational approach. In Proceedings of Eindhoven FASTAR Days. Citeseer, 2004. Google Scholar
  33. Vahid Yazdanpanah, Mehdi Dastani, Wojciech Jamroga, Natasha Alechina, and Brian Logan. Strategic Responsibility Under Imperfect Information. In Proc. of the 18th Intern. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), pages 592-600. AAMAS Foundation, 2019. URL: http://dl.acm.org/citation.cfm?id=3331745.