Document Open Access Logo

On the Complexity of Value Iteration (Track B: Automata, Logic, Semantics, and Theory of Programming)

Authors Nikhil Balaji, Stefan Kiefer, Petr Novotný , Guillermo A. Pérez , Mahsa Shirmohammadi



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2019.102.pdf
  • Filesize: 0.52 MB
  • 15 pages

Document Identifiers

Author Details

Nikhil Balaji
  • University of Oxford, UK
Stefan Kiefer
  • University of Oxford, UK
Petr Novotný
  • Masaryk University, Brno, Czech Republic
Guillermo A. Pérez
  • University of Antwerp, Belgium
Mahsa Shirmohammadi
  • CNRS, Paris, France
  • IRIF, Paris, France

Acknowledgements

We thank James Worrell for helpful comments on early version of this work.

Cite AsGet BibTex

Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, and Mahsa Shirmohammadi. On the Complexity of Value Iteration (Track B: Automata, Logic, Semantics, and Theory of Programming). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 102:1-102:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICALP.2019.102

Abstract

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal n-step payoff by iterating n times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon n. In this paper, we settle the computational complexity of value iteration. We show that, given a horizon n in binary and an MDP, computing an optimal policy is EXPTIME-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis. To obtain this main result, we develop several stepping stones that yield results of an independent interest. For instance, we show that it is EXPTIME-complete to compute the n-fold iteration (with n in binary) of a function given by a straight-line program over the integers with max and + as operators. We also provide new complexity results for the bounded halting problem in linear-update counter machines.

Subject Classification

ACM Subject Classification
  • Theory of computation → Probabilistic computation
  • Theory of computation → Logic and verification
  • Theory of computation → Markov decision processes
Keywords
  • Markov decision processes
  • Value iteration
  • Formal verification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pieter Abbeel and Andrew Y. Ng. Learning first-order Markov models for control. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1-8. MIT Press, 2005. URL: http://papers.nips.cc/paper/2569-learning-first-order-markov-models-for-control.pdf.
  2. Eric Allender, Nikhil Balaji, and Samir Datta. Low-Depth Uniform Threshold Circuits and the Bit-Complexity of Straight Line Programs. In Erzsébet Csuhaj-Varjú, Martin Dietzfelbinger, and Zoltán Ésik, editors, Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part II, volume 8635 of Lecture Notes in Computer Science, pages 13-24. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-44465-8_2.
  3. Eric Allender, Peter Bürgisser, Johan Kjeldgaard-Pedersen, and Peter Bro Miltersen. On the complexity of numerical analysis. SIAM Journal on Computing, 38(5):1987-2006, 2009. Google Scholar
  4. Eric Allender, Andreas Krebs, and Pierre McKenzie. Better complexity bounds for cost register automata. Theory of Computing Systems, pages 1-19, 2017. Google Scholar
  5. Christel Baier and Katoen Joost-Pieter, editors. Principles of Model Checking. MIT Press, 2008. Google Scholar
  6. Richard Bellman. Dynamic Programming. Princeton University Press, 1957. Google Scholar
  7. Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819-840, 2002. Google Scholar
  8. Dimitri P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1987. Google Scholar
  9. Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena scientific Belmont, MA, 2005. Google Scholar
  10. Vincent D Blondel and John N Tsitsiklis. A survey of computational complexity results in systems and control. Automatica, 36(9):1249-1274, 2000. Google Scholar
  11. Nicole Bäuerle and Ulrich Rieder. Markov Decision Processes with Applications to Finance. Springer-Verlag Berlin Heidelberg, 2011. Google Scholar
  12. Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Bloem Roderick, editors. Handbook of Model Checking. Springer International Publishing, 2018. Google Scholar
  13. Wojciech Czerwinski, Slawomir Lasota, Ranko Lazic, Jérôme Leroux, and Filip Mazowiecki. The Reachability Problem for Petri Nets is Not Elementary (Extended Abstract). CoRR, abs/1809.07115, 2018. URL: http://arxiv.org/abs/1809.07115.
  14. Peng Dai, Mausam, Daniel S. Weld, and Judy Goldsmith. Topological Value Iteration Algorithms. J. Artif. Intell. Res., 42:181-209, 2011. URL: http://jair.org/papers/paper3390.html.
  15. Laurent Doyen, Thierry Massart, and Mahsa Shirmohammadi. Limit Synchronization in Markov Decision Processes. In Anca Muscholl, editor, Foundations of Software Science and Computation Structures - 17th International Conference, FOSSACS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014, Proceedings, volume 8412 of Lecture Notes in Computer Science, pages 58-72. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-642-54830-7_4.
  16. John Fearnley and Rahul Savani. The Complexity of the Simplex Method. In Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC '15, pages 201-208, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2746539.2746558.
  17. Judy Goldsmith, Michael L Littman, and Martin Mundhenk. The complexity of plan existence and evaluation in probabilistic domains. In Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence (UAI'97), pages 182-189. Morgan Kaufmann Publishers Inc., 1997. Google Scholar
  18. Judy Goldsmith and Martin Mundhenk. Complexity Issues in Markov Decision Processes. In Proceedings of the 13th Annual IEEE Conference on Computational Complexity, Buffalo, New York, USA, June 15-18, 1998, pages 272-280. IEEE Computer Society, 1998. URL: http://dx.doi.org/10.1109/CCC.1998.694621.
  19. R. Greenlaw, H.J. Hoover, and W.L. Ruzzo. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, 1995. URL: https://books.google.fr/books?id=YZHnCwAAQBAJ.
  20. William Hesse, Eric Allender, and David A. Mix Barrington. Uniform constant-depth threshold circuits for division and iterated multiplication. J. Comput. Syst. Sci., 65(4):695-716, 2002. URL: http://dx.doi.org/10.1016/S0022-0000(02)00025-9.
  21. Michael L Littman. Probabilistic propositional planning: Representations and complexity. In AAAI'97, pages 748-754, 1997. Google Scholar
  22. Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling. On the Complexity of Solving Markov Decision Problems. In Philippe Besnard and Steve Hanks, editors, UAI '95: Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, August 18-20, 1995, pages 394-402. Morgan Kaufmann, 1995. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=457&proceeding_id=11.
  23. Michael L Littman, Judy Goldsmith, and Martin Mundhenk. The computational complexity of probabilistic planning. Journal of Artificial Intelligence Research, 9:1-36, 1998. Google Scholar
  24. Yishay Mansour and Satinder P. Singh. On the Complexity of Policy Iteration. In Kathryn B. Laskey and Henri Prade, editors, UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30 - August 1, 1999, pages 401-408. Morgan Kaufmann, 1999. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=192&proceeding_id=15.
  25. Ernst W. Mayr. An algorithm for the general Petri net reachability problem. In Proceedings of the thirteenth annual ACM Symposium on Theory of computing (STOC'81), pages 238-246. ACM, 1981. Google Scholar
  26. Martin Mundhenk, Judy Goldsmith, Christopher Lusena, and Eric Allender. Complexity of Finite-horizon Markov Decision Process Problems. J. ACM, 47(4):681-720, July 2000. URL: http://dx.doi.org/10.1145/347476.347480.
  27. Christos H. Papadimitriou and John N. Tsitsiklis. The Complexity of Markov Decision Processes. Math. Oper. Res., 12(3):441-450, 1987. URL: http://dx.doi.org/10.1287/moor.12.3.441.
  28. Christos H. Papadimitriou and Mihalis Yannakakis. A Note on Succinct Representations of Graphs. Information and Control, 71(3):181-185, 1986. URL: http://dx.doi.org/10.1016/S0019-9958(86)80009-2.
  29. Martin L. Puterman. Markov Decision Processes. Wiley-Interscience, 2005. Google Scholar
  30. Tim Quatmann and Joost-Pieter Katoen. Sound Value Iteration. In Hana Chockler and Georg Weissenbacher, editors, Computer Aided Verification, pages 643-661. Springer International Publishing, 2018. Google Scholar
  31. Manfred Schäl. Markov decision processes in finance and dynamic options. In Handbook of Markov Decision Processes, International Series in Operations Research &Management Science, pages 461-487. Springer, 2002. Google Scholar
  32. Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes, pages 770-787. SIAM, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.50.
  33. Olivier Sigaud and Olivier Buffet. Markov Decision Processes in Artificial Intelligence. John Wiley &Sons, 2013. Google Scholar
  34. R.S. Sutton and A.G Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, 2018. Google Scholar
  35. Aviv Tamar, YI WU, Garrett Thomas, Sergey Levine, and Pieter Abbeel. Value Iteration Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2154-2162. Curran Associates, Inc., 2016. URL: http://papers.nips.cc/paper/6046-value-iteration-networks.pdf.
  36. Paul Tseng. Solving H-horizon, Stationary Markov Decision Problems in Time Proportional to log(H). Operations Research Letters, 9(5):287-297, 1990. Google Scholar
  37. Zhimin Wu, Ernst Moritz Hahn, Akin Günay, Lijun Zhang, and Yang Liu. GPU-accelerated value iteration for the computation of reachability probabilities in mdps. In Gal A. Kaminka, Maria Fox, Paolo Bouquet, Eyke Hüllermeier, Virginia Dignum, Frank Dignum, and Frank van Harmelen, editors, ECAI 2016 - 22nd European Conference on Artificial Intelligence, pages 1726-1727. IOS Press, 2016. URL: http://dx.doi.org/10.3233/978-1-61499-672-9-1726.
  38. Yinyu Ye. A New Complexity Result on Solving the Markov Decision Problem. Mathematics of Operations Research, 30(3):733-749, 2005. Google Scholar
  39. Yinyu Ye. The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate. Mathematics of Operations Research, 36(4):593-603, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail