Anytime Guarantees for Reachability in Uncountable Markov Decision Processes

Authors Kush Grover , Jan Křetínský , Tobias Meggendorfer , Maximilian Weininger



PDF
Thumbnail PDF

File

LIPIcs.CONCUR.2022.11.pdf
  • Filesize: 0.91 MB
  • 20 pages

Document Identifiers

Author Details

Kush Grover
  • Technische Universität München, Germany
Jan Křetínský
  • Technische Universität München, Germany
Tobias Meggendorfer
  • Institute of Science and Technology Austria, Wien, Austria
Maximilian Weininger
  • Technische Universität München, Germany

Cite AsGet BibTex

Kush Grover, Jan Křetínský, Tobias Meggendorfer, and Maximilian Weininger. Anytime Guarantees for Reachability in Uncountable Markov Decision Processes. In 33rd International Conference on Concurrency Theory (CONCUR 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 243, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.CONCUR.2022.11

Abstract

We consider the problem of approximating the reachability probabilities in Markov decision processes (MDP) with uncountable (continuous) state and action spaces. While there are algorithms that, for special classes of such MDP, provide a sequence of approximations converging to the true value in the limit, our aim is to obtain an algorithm with guarantees on the precision of the approximation. As this problem is undecidable in general, assumptions on the MDP are necessary. Our main contribution is to identify sufficient assumptions that are as weak as possible, thus approaching the "boundary" of which systems can be correctly and reliably analyzed. To this end, we also argue why each of our assumptions is necessary for algorithms based on processing finitely many observations. We present two solution variants. The first one provides converging lower bounds under weaker assumptions than typical ones from previous works concerned with guarantees. The second one then utilizes stronger assumptions to additionally provide converging upper bounds. Altogether, we obtain an anytime algorithm, i.e. yielding a sequence of approximants with known and iteratively improving precision, converging to the true value in the limit. Besides, due to the generality of our assumptions, our algorithms are very general templates, readily allowing for various heuristics from literature in contrast to, e.g., a specific discretization algorithm. Our theoretical contribution thus paves the way for future practical improvements without sacrificing correctness guarantees.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Markov processes
  • Mathematics of computing → Continuous mathematics
  • Computing methodologies → Continuous models
Keywords
  • Uncountable system
  • Markov decision process
  • discrete-time Markov control process
  • probabilistic verification
  • anytime guarantee

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alessandro Abate, Saurabh Amin, Maria Prandini, John Lygeros, and Shankar Sastry. Computational approaches to reachability analysis of stochastic hybrid systems. In HSCC, volume 4416 of Lecture Notes in Computer Science, pages 4-17. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-71493-4_4.
  2. Alessandro Abate, Joost-Pieter Katoen, John Lygeros, and Maria Prandini. Approximate model checking of stochastic hybrid systems. Eur. J. Control, 16(6):624-641, 2010. URL: https://doi.org/10.3166/ejc.16.624-641.
  3. Alessandro Abate, Maria Prandini, John Lygeros, and Shankar Sastry. Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica, 44(11):2724-2734, 2008. URL: https://doi.org/10.1016/j.automatica.2008.03.027.
  4. Christel Baier, Joachim Klein, Linda Leuschner, David Parker, and Sascha Wunderlich. Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In CAV (1), volume 10426 of Lecture Notes in Computer Science, pages 160-180. Springer, 2017. Google Scholar
  5. Dimitri Bertsekas. Convergence of discretization procedures in dynamic programming. IEEE Transactions on Automatic Control, 20(3):415-419, 1975. Google Scholar
  6. Dimitri P Bertsekas and Steven Shreve. Stochastic optimal control: the discrete-time case, 1978. Google Scholar
  7. Dimitri P. Bertsekas and John N. Tsitsiklis. An analysis of stochastic shortest path problems. Math. Oper. Res., 16(3):580-595, 1991. URL: https://doi.org/10.1287/moor.16.3.580.
  8. Patrick Billingsley. Probability and Measure, volume 939. John Wiley & Sons, 2012. Google Scholar
  9. Tomás Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtech Forejt, Jan Kretínský, Marta Z. Kwiatkowska, David Parker, and Mateusz Ujma. Verification of Markov decision processes using learning algorithms. In ATVA, volume 8837 of Lecture Notes in Computer Science, pages 98-114. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-11936-6_8.
  10. John L. Bresina, Richard Dearden, Nicolas Meuleau, Sailesh Ramakrishnan, David E. Smith, and Richard Washington. Planning under continuous time and resource uncertainty: A challenge for AI. CoRR, abs/1301.0559, 2013. URL: http://arxiv.org/abs/1301.0559.
  11. Debasish Chatterjee, Eugenio Cinquemani, and John Lygeros. Maximizing the probability of attaining a target prior to extinction. Nonlinear Analysis: Hybrid Systems, 5(2):367-381, 2011. Google Scholar
  12. Krishnendu Chatterjee, Zuzana Kretínská, and Jan Kretínský. Unifying two views on multiple mean-payoff objectives in Markov decision processes. Logical Methods in Computer Science, 13(2), 2017. URL: https://doi.org/10.23638/LMCS-13(2:15)2017.
  13. Anne Condon. The complexity of stochastic games. Inf. Comput., 96(2):203-224, 1992. URL: https://doi.org/10.1016/0890-5401(92)90048-K.
  14. Zhengzhu Feng, Richard Dearden, Nicolas Meuleau, and Richard Washington. Dynamic programming for structured continuous Markov decision problems. In UAI, pages 154-161. AUAI Press, 2004. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1102&proceeding_id=20.
  15. Vojtech Forejt, Marta Z. Kwiatkowska, Gethin Norman, and David Parker. Automated verification techniques for probabilistic systems. In SFM, volume 6659 of Lecture Notes in Computer Science, pages 53-113. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-21455-4_3.
  16. Hongfei Fu and Krishnendu Chatterjee. Termination of nondeterministic probabilistic programs. In VMCAI, volume 11388 of Lecture Notes in Computer Science, pages 468-490. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-11245-5_22.
  17. Kush Grover, Jan Kretínský, Tobias Meggendorfer, and Maximilian Weininger. Anytime guarantees for reachability in uncountable markov decision processes. CoRR, abs/2008.04824, 2020. URL: http://arxiv.org/abs/2008.04824.
  18. Carlos Guestrin, Milos Hauskrecht, and Branislav Kveton. Solving factored MDPs with continuous and discrete variables. In UAI, pages 235-242. AUAI Press, 2004. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1113&proceeding_id=20.
  19. Serge Haddad and Benjamin Monmege. Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci., 735:111-131, 2018. URL: https://doi.org/10.1016/j.tcs.2016.12.003.
  20. Sofie Haesaert, Sadegh Soudjani, and Alessandro Abate. Temporal logic control of general Markov decision processes by approximate policy refinement. In ADHS, volume 51(16) of IFAC-PapersOnLine, pages 73-78. Elsevier, 2018. URL: https://doi.org/10.1016/j.ifacol.2018.08.013.
  21. Sofie Haesaert, Sadegh Esmaeil Zadeh Soudjani, and Alessandro Abate. Verification of general Markov decision processes by approximate similarity relations and policy refinement. SIAM J. Control and Optimization, 55(4):2333-2367, 2017. URL: https://doi.org/10.1137/16M1079397.
  22. Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic value iteration. In CAV (2), volume 12225 of Lecture Notes in Computer Science, pages 488-511. Springer, 2020. Google Scholar
  23. Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. Logically-constrained neural fitted q-iteration. In AAMAS, pages 2012-2014. International Foundation for Autonomous Agents and Multiagent Systems, 2019. URL: http://dl.acm.org/citation.cfm?id=3331994.
  24. Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. Certified reinforcement learning with logic guidance. CoRR, abs/1902.00778, 2019. URL: http://arxiv.org/abs/1902.00778.
  25. William B. Haskell, Rahul Jain, Hiteshi Sharma, and Pengqian Yu. A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Automat. Contr., 65(1):115-129, 2020. URL: https://doi.org/10.1109/TAC.2019.2907414.
  26. Thomas A. Henzinger, Peter W. Kopke, Anuj Puri, and Pravin Varaiya. What’s decidable about hybrid automata? In STOC, pages 373-382. ACM, 1995. Google Scholar
  27. Thomas A. Henzinger, Peter W. Kopke, Anuj Puri, and Pravin Varaiya. What’s decidable about hybrid automata? J. Comput. Syst. Sci., 57(1):94-124, 1998. URL: https://doi.org/10.1006/jcss.1998.1581.
  28. Onésimo Hernández-Lerma and Jean B Lasserre. Discrete-time Markov control processes: basic optimality criteria, volume 30. Springer Science & Business Media, 2012. Google Scholar
  29. Ronald A Howard. Dynamic programming and Markov processes. John Wiley, 1960. Google Scholar
  30. Mingzhang Huang, Hongfei Fu, and Krishnendu Chatterjee. New approaches for almost-sure termination of probabilistic programs. In Program. Lang. and Sys., volume 11275 of Lecture Notes in Computer Science, pages 181-201. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-02768-1_11.
  31. Mingzhang Huang, Hongfei Fu, Krishnendu Chatterjee, and Amir Kafshdar Goharshady. Modular verification for almost-sure termination of probabilistic programs. Proc. ACM Program. Lang., 3(OOPSLA):129:1-129:29, 2019. URL: https://doi.org/10.1145/3360555.
  32. Manfred Jaeger, Peter Gjøl Jensen, Kim Guldstrand Larsen, Axel Legay, Sean Sedwards, and Jakob Haahr Taankvist. Teaching stratego to play ball: Optimal synthesis for continuous space MDPs. In ATVA, volume 11781 of Lecture Notes in Computer Science, pages 81-97. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-31784-3_5.
  33. Benjamin Lucien Kaminski and Joost-Pieter Katoen. On the hardness of almost-sure termination. In MFCS, volume 9234 of Lecture Notes in Computer Science, pages 307-318. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-48057-1_24.
  34. Jan Kretínský and Tobias Meggendorfer. Of cores: A partial-exploration framework for Markov decision processes. In CONCUR, volume 140 of LIPIcs, pages 5:1-5:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CONCUR.2019.5.
  35. Ratan Lal and Pavithra Prabhakar. Bounded verification of reachability of probabilistic hybrid systems. In QEST, volume 11024 of Lecture Notes in Computer Science, pages 240-256. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-99154-2_15.
  36. Bernard F Lamond and Abdeslem Boukhtouta. Water reservoir applications of Markov decision processes. In Handbook of Markov decision processes, pages 537-558. Springer, 2002. Google Scholar
  37. Lihong Li and Michael L. Littman. Lazy approximation for solving continuous finite-horizon MDPs. In AAAI, pages 1175-1180. AAAI Press / The MIT Press, 2005. URL: http://www.aaai.org/Library/AAAI/2005/aaai05-186.php.
  38. Masoud Mahootchi. Storage system management using reinforcement learning techniques and nonlinear models. PhD thesis, University of Waterloo, 2009. Google Scholar
  39. H. Brendan McMahan, Maxim Likhachev, and Geoffrey J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In ICML, volume 119 of ACM International Conference Proceeding Series, pages 569-576. ACM, 2005. URL: https://doi.org/10.1145/1102351.1102423.
  40. Francisco S. Melo, Sean P. Meyn, and M. Isabel Ribeiro. An analysis of reinforcement learning with function approximation. In ICML, volume 307 of ACM International Conference Proceeding Series, pages 664-671. ACM, 2008. URL: https://doi.org/10.1145/1390156.1390240.
  41. Goran Peskir and Albert Shiryaev. Optimal stopping and free-boundary problems. Springer, 2006. Google Scholar
  42. Amir Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pages 46-57. IEEE Computer Society, 1977. URL: https://doi.org/10.1109/SFCS.1977.32.
  43. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994. URL: https://doi.org/10.1002/9780470316887.
  44. Tim Quatmann and Joost-Pieter Katoen. Sound value iteration. In CAV (1), volume 10981 of Lecture Notes in Computer Science, pages 643-661. Springer, 2018. Google Scholar
  45. Scott Sanner, Karina Valdivia Delgado, and Leliane Nunes de Barros. Symbolic dynamic programming for discrete and continuous state MDPs. In UAI, pages 643-652. AUAI Press, 2011. URL: https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=2223&proceeding_id=27.
  46. Claude Elwood Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10-21, 1949. Google Scholar
  47. Hiteshi Sharma, Mehdi Jafarnia-Jahromi, and Rahul Jain. Approximate relative value learning for average-reward continuous state MDPs. In UAI, page 341. AUAI Press, 2019. URL: http://auai.org/uai2019/proceedings/papers/341.pdf.
  48. Fedor Shmarov and Paolo Zuliani. Probreach: verified probabilistic delta-reachability for stochastic hybrid systems. In HSCC, pages 134-139. ACM, 2015. Google Scholar
  49. Sadegh Esmaeil Zadeh Soudjani and Alessandro Abate. Adaptive gridding for abstraction and verification of stochastic hybrid systems. In QEST, pages 59-68. IEEE Computer Society, 2011. URL: https://doi.org/10.1109/QEST.2011.16.
  50. Sean Summers and John Lygeros. Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem. Automatica, 46(12):1951-1961, 2010. URL: https://doi.org/10.1016/j.automatica.2010.08.006.
  51. Ilya Tkachev, Alexandru Mereacre, Joost-Pieter Katoen, and Alessandro Abate. Quantitative automata-based controller synthesis for non-autonomous stochastic hybrid systems. In HSCC, pages 293-302. ACM, 2013. URL: https://doi.org/10.1145/2461328.2461373.
  52. Ilya Tkachev, Alexandru Mereacre, Joost-Pieter Katoen, and Alessandro Abate. Quantitative model-checking of controlled discrete-time Markov processes. Inf. Comput., 253:1-35, 2017. URL: https://doi.org/10.1016/j.ic.2016.11.006.
  53. Hado van Hasselt. Reinforcement learning in continuous state and action spaces. In Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pages 207-251. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-27645-3_7.
  54. Luis Gustavo Rocha Vianna, Scott Sanner, and Leliane Nunes de Barros. Continuous real time dynamic programming for discrete and continuous state MDPs. In 2014 Brazilian Conference on Intelligent Systems, BRACIS 2014, Sao Paulo, Brazil, October 18-22, 2014, pages 134-139. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/BRACIS.2014.34.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail