Taming Infinity One Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

Ajdarów, Michal; Main, James C. A.; Novotný, Petr; Randour, Mickael

doi:10.4230/LIPIcs.ICALP.2025.138

Abstract

Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objectives: reaching a target state (state-reachability), and reaching a target state with counter value zero (selective termination). The synthesis problem for the latter is not known to be decidable and connected to major open problems in number theory. Furthermore, even seemingly simple strategies (e.g., memoryless ones) in OC-MDPs might be impossible to build in practice (due to the underlying infinite configuration space): we need finite, and preferably small, representations.
To overcome these obstacles, we introduce two natural classes of concisely represented strategies based on a (possibly infinite) partition of counter values in intervals. For both classes, and both objectives, we study the verification problem (does a given strategy ensure a high enough probability for the objective?), and two synthesis problems (does there exist such a strategy?): one where the interval partition is fixed as input, and one where it is only parameterized. We develop a generic approach based on a compression of the induced infinite MDP that yields decidability in all cases, with all complexities within PSPACE.

Michal Ajdarów, James C. A. Main, Petr Novotný, and Mickael Randour. Taming infinity one chunk at a time: Concisely represented strategies in one-counter MDPs. CoRR, abs/2503.00788, 2025. URL: https://doi.org/10.48550/arXiv.2503.00788.
Eric Allender, Peter Bürgisser, Johan Kjeldgaard-Pedersen, and Peter Bro Miltersen. On the complexity of numerical analysis. SIAM Journal on Computing, 38(5):1987-2006, 2009. URL: https://doi.org/10.1137/070697926.
Pranav Ashok, Mathias Jackermeier, Jan Kretínský, Christoph Weinhuber, Maximilian Weininger, and Mayank Yadav. dtControl 2.0: Explainable strategy representation via decision tree learning steered by experts. In Jan Friso Groote and Kim Guldstrand Larsen, editors, Proceedings (Part II) of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, Held as Part of ETAPS 2021, Luxemburg City, Luxemburg, March 27-April 1, 2021, volume 12652 of Lecture Notes in Computer Science, pages 326-345. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-72013-1_17.
Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008.
Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, and Mahsa Shirmohammadi. On the complexity of value iteration. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, Proceedings of the 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, Patras, Greece, July 9-12, 2019, volume 132 of LIPIcs, pages 102:1-102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPICS.ICALP.2019.102.
Saugata Basu, Richard Pollack, and Marie-Françoise Roy. Algorithms in Real Algebraic Geometry. 1431-1550. Springer, 2nd edition, 2006. URL: https://doi.org/10.1007/3-540-33099-2.
Noam Berger, Nevin Kapur, Leonard J. Schulman, and Vijay V. Vazirani. Solvency games. In Ramesh Hariharan, Madhavan Mukund, and V. Vinay, editors, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2008, December 9-11, 2008, Bangalore, India, volume 2 of LIPIcs, pages 61-72. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2008. URL: https://doi.org/10.4230/LIPICS.FSTTCS.2008.1741.
Kark-Josef Bierth. An expected average reward criterion. Stochastic processes and their applications, 26:123-140, 1987.
Frantisek Blahoudek, Tomás Brázdil, Petr Novotný, Melkior Ornik, Pranay Thangeda, and Ufuk Topcu. Qualitative controller synthesis for consumption Markov decision processes. In Shuvendu K. Lahiri and Chao Wang, editors, Proceedings (Part II) of the 32nd International Conference on Computer Aided Verification, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, volume 12225 of Lecture Notes in Computer Science, pages 421-447. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-53291-8_22.
Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. Graph games and reactive synthesis. In Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model Checking, pages 921-962. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-10575-8_27.
Lenore Blum, Mike Shub, and Steve Smale. Over the real numbers: NP-completeness, recursive functions and universal machines. Bulletin of the American Mathematical Society, 21(1), 1989.
Patricia Bouyer, Stéphane Le Roux, Youssouf Oualhadj, Mickael Randour, and Pierre Vandenhove. Games where you can play optimally with arena-independent finite memory. Logical Methods in Computer Science, 18(1), 2022. URL: https://doi.org/10.46298/lmcs-18(1:11)2022.
Patricia Bouyer, Youssouf Oualhadj, Mickael Randour, and Pierre Vandenhove. Arena-independent finite-memory determinacy in stochastic games. Log. Methods Comput. Sci., 19(4), 2023. URL: https://doi.org/10.46298/LMCS-19(4:18)2023.
Patricia Bouyer, Mickael Randour, and Pierre Vandenhove. The true colors of memory: A tour of chromatic-memory strategies in zero-sum games on graphs (invited talk). In Anuj Dawar and Venkatesan Guruswami, editors, Proceedings of the 42nd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2022, IIT Madras, Chennai, India, December 18-20, 2022, volume 250 of LIPIcs, pages 3:1-3:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2022.3.
Tomás Brázdil, Václav Brozek, Kousha Etessami, and Antonín Kucera. Approximating the termination value of one-counter MDPs and stochastic games. Information and Computation, 222:121-138, 2013. URL: https://doi.org/10.1016/J.IC.2012.01.008.
Tomás Brázdil, Václav Brozek, Kousha Etessami, Antonín Kucera, and Dominik Wojtczak. One-counter Markov decision processes. In Moses Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 863-874. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611973075.70.
Tomás Brázdil, Krishnendu Chatterjee, Martin Chmelik, Andreas Fellner, and Jan Kretínský. Counterexample explanation by learning small strategies in Markov decision processes. In Daniel Kroening and Corina S. Pasareanu, editors, Proceedings (Part I) of the 27th International Conference on Computer Aided Verification, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, volume 9206 of Lecture Notes in Computer Science, pages 158-177. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21690-4_10.
Tomás Brázdil, Krishnendu Chatterjee, Jan Kretínský, and Viktor Toman. Strategy representation by decision trees in reactive synthesis. In Dirk Beyer and Marieke Huisman, editors, Proceedings (Part I) of the 24th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2018, Held as Part of ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, volume 10805 of Lecture Notes in Computer Science, pages 385-407. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-89960-2_21.
Tomás Brázdil, Krishnendu Chatterjee, Antonín Kucera, and Petr Novotný. Efficient controller synthesis for consumption games with multiple resource types. In P. Madhusudan and Sanjit A. Seshia, editors, Computer Aided Verification - 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings, volume 7358 of Lecture Notes in Computer Science, pages 23-38. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-31424-7_8.
Tomás Brázdil, Stefan Kiefer, and Antonín Kucera. Efficient analysis of probabilistic programs with an unbounded counter. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, volume 6806 of Lecture Notes in Computer Science, pages 208-224. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22110-1_18.
Thomas Brihaye, Aline Goeminne, James C. A. Main, and Mickael Randour. Reachability games and friends: A journey through the lens of memory and complexity (invited talk). In Patricia Bouyer and Srikanth Srinivasan, editors, Proceedings of the 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2023, IIIT Hyderabad, Telangana, India, December 18-20, 2023, volume 284 of LIPIcs, pages 1:1-1:26. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.FSTTCS.2023.1.
Krishnendu Chatterjee, Marcin Jurdzinski, and Thomas A. Henzinger. Quantitative stochastic parity games. In J. Ian Munro, editor, Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 121-130. SIAM, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982808.
Krishnendu Chatterjee, Mickael Randour, and Jean-François Raskin. Strategy synthesis for multi-dimensional quantitative objectives. Acta Informatica, 51(3-4):129-163, 2014. URL: https://doi.org/10.1007/s00236-013-0182-6.
Florent Delgrange, Joost-Pieter Katoen, Tim Quatmann, and Mickael Randour. Simple strategies in multi-objective MDPs. In Armin Biere and David Parker, editors, Proceedings (Part I) of the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2020, Held as Part of ETAPS 2020, Dublin, Ireland, April 25-30, 2020, volume 12078 of Lecture Notes in Computer Science, pages 346-364. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-45190-5_19.
Kousha Etessami, Marta Z. Kwiatkowska, Moshe Y. Vardi, and Mihalis Yannakakis. Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science, 4(4), 2008. URL: https://doi.org/10.2168/LMCS-4(4:8)2008.
Kousha Etessami, Dominik Wojtczak, and Mihalis Yannakakis. Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. Performance Evaluation, 67(9):837-857, 2010. URL: https://doi.org/10.1016/J.PEVA.2009.12.009.
Nathanaël Fijalkow, Nathalie Bertrand, Patricia Bouyer-Decitre, Romain Brenguier, Arnaud Carayol, John Fearnley, Hugo Gimbert, Florian Horn, Rasmus Ibsen-Jensen, Nicolas Markey, Benjamin Monmege, Petr Novotný, Mickael Randour, Ocan Sankur, Sylvain Schmitz, Olivier Serre, and Mateusz Skomra. Games on graphs. CoRR, abs/2305.10546, 2023. URL: https://doi.org/10.48550/arXiv.2305.10546.
M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
Hugo Gimbert. Pure stationary optimal strategies in Markov decision processes. In Wolfgang Thomas and Pascal Weil, editors, Proceedings of the 24th Annual Symposium on Theoretical Aspects of Computer Science, STACS 2007, Aachen, Germany, February 22-24, 2007, volume 4393, pages 200-211. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-70918-3_18.
Florian Jüngermann, Jan Kretínský, and Maximilian Weininger. Algebraically explainable controllers: decision trees and support vector machines join forces. International Journal on Software Tools for Technology Transfer, 25(3):249-266, 2023. URL: https://doi.org/10.1007/S10009-023-00716-Z.
Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Patrick Totzke, and Dominik Wojtczak. How to play in infinite MDPs (invited talk). In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, Proceedings of the 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, Saarbrücken, Germany, July 8-11, 2020, volume 168 of LIPIcs, pages 3:1-3:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ICALP.2020.3.
Antonín Kucera, Javier Esparza, and Richard Mayr. Model checking probabilistic pushdown automata. Logical Methods in Computer Science, 2(1), 2006. URL: https://doi.org/10.2168/LMCS-2(1:2)2006.
James C. A. Main and Mickael Randour. Different strokes in randomised strategies: Revisiting Kuhn’s theorem under finite-memory assumptions. Information and Computation, 301:105229, 2024. URL: https://doi.org/10.1016/J.IC.2024.105229.
James C. A. Main and Mickael Randour. Mixing any cocktail with limited ingredients: On the structure of payoff sets in multi-objective MDPs and its impact on randomised strategies. CoRR, abs/2502.18296, 2025. URL: https://doi.org/10.48550/arXiv.2502.18296.
Donald Ornstein. On the existence of stationary optimal strategies. Proceedings of the American Mathematical Society, 20(2):563-569, 1969. URL: http://www.jstor.org/stable/2035700.
Joël Ouaknine and James Worrell. Positivity problems for low-order linear recurrence sequences. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 366-379. SIAM, 2014. URL: https://doi.org/10.1137/1.9781611973402.27.
Jakob Piribauer and Christel Baier. Positivity-hardness results on Markov decision processes. TheoretiCS, 3, 2024. URL: https://doi.org/10.46298/THEORETICS.24.9.
Mickael Randour. Automated synthesis of reliable and efficient systems through game theory: A case study. In Proc. of ECCS 2012, Springer Proceedings in Complexity XVII, pages 731-738. Springer, 2013. URL: https://doi.org/10.1007/978-3-319-00395-5_90.
Mickael Randour, Jean-François Raskin, and Ocan Sankur. Percentile queries in multi-dimensional Markov decision processes. Formal methods in system design, 50(2-3):207-248, 2017. URL: https://doi.org/10.1007/s10703-016-0262-7.
Lloyd S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095-1100, 1953. URL: https://doi.org/10.1073/pnas.39.10.1095.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2018.
Prasoon Tiwari. A problem that is easier to solve on the unit-cost algebraic RAM. Journal of Complexity, 8(4):393-397, 1992. URL: https://doi.org/10.1016/0885-064X(92)90003-T.

Taming Infinity One Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

Authors Michal Ajdarów , James C. A. Main , Petr Novotný , Mickael Randour

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Taming Infinity One Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

Authors Michal Ajdarów , James C. A. Main , Petr Novotný , Mickael Randour

Files

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message