On the Complexity of Value Iteration (Track B: Automata, Logic, Semantics, and Theory of Programming)

Balaji, Nikhil; Kiefer, Stefan; Novotný, Petr; Pérez, Guillermo A.; Shirmohammadi, Mahsa

doi:10.4230/LIPIcs.ICALP.2019.102

File

LIPIcs.ICALP.2019.102.pdf

Filesize: 0.52 MB
15 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICALP.2019.102
URN: urn:nbn:de:0030-drops-106782

Author Details

Nikhil Balaji

University of Oxford, UK

Stefan Kiefer

University of Oxford, UK

Petr Novotný

Masaryk University, Brno, Czech Republic

Guillermo A. Pérez

University of Antwerp, Belgium

Mahsa Shirmohammadi

CNRS, Paris, France
IRIF, Paris, France

Acknowledgements

We thank James Worrell for helpful comments on early version of this work.

Cite AsGet BibTex

Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, and Mahsa Shirmohammadi. On the Complexity of Value Iteration (Track B: Automata, Logic, Semantics, and Theory of Programming). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 102:1-102:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICALP.2019.102

Abstract

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal n-step payoff by iterating n times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon n. In this paper, we settle the computational complexity of value iteration. We show that, given a horizon n in binary and an MDP, computing an optimal policy is EXPTIME-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis. To obtain this main result, we develop several stepping stones that yield results of an independent interest. For instance, we show that it is EXPTIME-complete to compute the n-fold iteration (with n in binary) of a function given by a straight-line program over the integers with max and + as operators. We also provide new complexity results for the bounded halting problem in linear-update counter machines.

Subject Classification

ACM Subject Classification

Theory of computation → Probabilistic computation
Theory of computation → Logic and verification
Theory of computation → Markov decision processes

Keywords

Markov decision processes
Value iteration
Formal verification

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Pieter Abbeel and Andrew Y. Ng. Learning first-order Markov models for control. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1-8. MIT Press, 2005. URL: http://papers.nips.cc/paper/2569-learning-first-order-markov-models-for-control.pdf.
Eric Allender, Nikhil Balaji, and Samir Datta. Low-Depth Uniform Threshold Circuits and the Bit-Complexity of Straight Line Programs. In Erzsébet Csuhaj-Varjú, Martin Dietzfelbinger, and Zoltán Ésik, editors, Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part II, volume 8635 of Lecture Notes in Computer Science, pages 13-24. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-44465-8_2.
Eric Allender, Peter Bürgisser, Johan Kjeldgaard-Pedersen, and Peter Bro Miltersen. On the complexity of numerical analysis. SIAM Journal on Computing, 38(5):1987-2006, 2009.
Eric Allender, Andreas Krebs, and Pierre McKenzie. Better complexity bounds for cost register automata. Theory of Computing Systems, pages 1-19, 2017.
Christel Baier and Katoen Joost-Pieter, editors. Principles of Model Checking. MIT Press, 2008.
Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819-840, 2002.
Dimitri P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1987.
Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena scientific Belmont, MA, 2005.
Vincent D Blondel and John N Tsitsiklis. A survey of computational complexity results in systems and control. Automatica, 36(9):1249-1274, 2000.
Nicole Bäuerle and Ulrich Rieder. Markov Decision Processes with Applications to Finance. Springer-Verlag Berlin Heidelberg, 2011.
Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Bloem Roderick, editors. Handbook of Model Checking. Springer International Publishing, 2018.
Wojciech Czerwinski, Slawomir Lasota, Ranko Lazic, Jérôme Leroux, and Filip Mazowiecki. The Reachability Problem for Petri Nets is Not Elementary (Extended Abstract). CoRR, abs/1809.07115, 2018. URL: http://arxiv.org/abs/1809.07115.
Peng Dai, Mausam, Daniel S. Weld, and Judy Goldsmith. Topological Value Iteration Algorithms. J. Artif. Intell. Res., 42:181-209, 2011. URL: http://jair.org/papers/paper3390.html.
Laurent Doyen, Thierry Massart, and Mahsa Shirmohammadi. Limit Synchronization in Markov Decision Processes. In Anca Muscholl, editor, Foundations of Software Science and Computation Structures - 17th International Conference, FOSSACS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014, Proceedings, volume 8412 of Lecture Notes in Computer Science, pages 58-72. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-642-54830-7_4.
John Fearnley and Rahul Savani. The Complexity of the Simplex Method. In Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC '15, pages 201-208, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2746539.2746558.
Judy Goldsmith, Michael L Littman, and Martin Mundhenk. The complexity of plan existence and evaluation in probabilistic domains. In Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence (UAI'97), pages 182-189. Morgan Kaufmann Publishers Inc., 1997.
Judy Goldsmith and Martin Mundhenk. Complexity Issues in Markov Decision Processes. In Proceedings of the 13th Annual IEEE Conference on Computational Complexity, Buffalo, New York, USA, June 15-18, 1998, pages 272-280. IEEE Computer Society, 1998. URL: http://dx.doi.org/10.1109/CCC.1998.694621.
R. Greenlaw, H.J. Hoover, and W.L. Ruzzo. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, 1995. URL: https://books.google.fr/books?id=YZHnCwAAQBAJ.
William Hesse, Eric Allender, and David A. Mix Barrington. Uniform constant-depth threshold circuits for division and iterated multiplication. J. Comput. Syst. Sci., 65(4):695-716, 2002. URL: http://dx.doi.org/10.1016/S0022-0000(02)00025-9.
Michael L Littman. Probabilistic propositional planning: Representations and complexity. In AAAI'97, pages 748-754, 1997.
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling. On the Complexity of Solving Markov Decision Problems. In Philippe Besnard and Steve Hanks, editors, UAI '95: Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, August 18-20, 1995, pages 394-402. Morgan Kaufmann, 1995. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=457&proceeding_id=11.
Michael L Littman, Judy Goldsmith, and Martin Mundhenk. The computational complexity of probabilistic planning. Journal of Artificial Intelligence Research, 9:1-36, 1998.
Yishay Mansour and Satinder P. Singh. On the Complexity of Policy Iteration. In Kathryn B. Laskey and Henri Prade, editors, UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30 - August 1, 1999, pages 401-408. Morgan Kaufmann, 1999. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=192&proceeding_id=15.
Ernst W. Mayr. An algorithm for the general Petri net reachability problem. In Proceedings of the thirteenth annual ACM Symposium on Theory of computing (STOC'81), pages 238-246. ACM, 1981.
Martin Mundhenk, Judy Goldsmith, Christopher Lusena, and Eric Allender. Complexity of Finite-horizon Markov Decision Process Problems. J. ACM, 47(4):681-720, July 2000. URL: http://dx.doi.org/10.1145/347476.347480.
Christos H. Papadimitriou and John N. Tsitsiklis. The Complexity of Markov Decision Processes. Math. Oper. Res., 12(3):441-450, 1987. URL: http://dx.doi.org/10.1287/moor.12.3.441.
Christos H. Papadimitriou and Mihalis Yannakakis. A Note on Succinct Representations of Graphs. Information and Control, 71(3):181-185, 1986. URL: http://dx.doi.org/10.1016/S0019-9958(86)80009-2.
Martin L. Puterman. Markov Decision Processes. Wiley-Interscience, 2005.
Tim Quatmann and Joost-Pieter Katoen. Sound Value Iteration. In Hana Chockler and Georg Weissenbacher, editors, Computer Aided Verification, pages 643-661. Springer International Publishing, 2018.
Manfred Schäl. Markov decision processes in finance and dynamic options. In Handbook of Markov Decision Processes, International Series in Operations Research &Management Science, pages 461-487. Springer, 2002.
Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes, pages 770-787. SIAM, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.50.
Olivier Sigaud and Olivier Buffet. Markov Decision Processes in Artificial Intelligence. John Wiley &Sons, 2013.
R.S. Sutton and A.G Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, 2018.
Aviv Tamar, YI WU, Garrett Thomas, Sergey Levine, and Pieter Abbeel. Value Iteration Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2154-2162. Curran Associates, Inc., 2016. URL: http://papers.nips.cc/paper/6046-value-iteration-networks.pdf.
Paul Tseng. Solving H-horizon, Stationary Markov Decision Problems in Time Proportional to log(H). Operations Research Letters, 9(5):287-297, 1990.
Zhimin Wu, Ernst Moritz Hahn, Akin Günay, Lijun Zhang, and Yang Liu. GPU-accelerated value iteration for the computation of reachability probabilities in mdps. In Gal A. Kaminka, Maria Fox, Paolo Bouquet, Eyke Hüllermeier, Virginia Dignum, Frank Dignum, and Frank van Harmelen, editors, ECAI 2016 - 22nd European Conference on Artificial Intelligence, pages 1726-1727. IOS Press, 2016. URL: http://dx.doi.org/10.3233/978-1-61499-672-9-1726.
Yinyu Ye. A New Complexity Result on Solving the Markov Decision Problem. Mathematics of Operations Research, 30(3):733-749, 2005.
Yinyu Ye. The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate. Mathematics of Operations Research, 36(4):593-603, 2011.