Document

Hedging Bets in Markov Decision Processes

File

LIPIcs.CSL.2016.29.pdf
• Filesize: 0.51 MB
• 20 pages

Cite As

Rajeev Alur, Marco Faella, Sampath Kannan, and Nimit Singhania. Hedging Bets in Markov Decision Processes. In 25th EACSL Annual Conference on Computer Science Logic (CSL 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 62, pp. 29:1-29:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.CSL.2016.29

Abstract

The classical model of Markov decision processes with costs or rewards, while widely used to formalize optimal decision making, cannot capture scenarios where there are multiple objectives for the agent during the system evolution, but only one of these objectives gets actualized upon termination. We introduce the model of Markov decision processes with alternative objectives (MDPAO) for formalizing optimization in such scenarios. To compute the strategy to optimize the expected cost/reward upon termination, we need to figure out how to balance the values of the alternative objectives. This requires analysis of the underlying infinite-state process that tracks the accumulated values of all the objectives. While the decidability of the problem of computing the exact optimal strategy for the general model remains open, we present the following results. First, for a Markov chain with alternative objectives, the optimal expected cost/reward can be computed in polynomial-time. Second, for a single-state process with two actions and multiple objectives we show how to compute the optimal decision strategy. Third, for a process with only two alternative objectives, we present a reduction to the minimum expected accumulated reward problem for one-counter MDPs, and this leads to decidability for this case under some technical restrictions. Finally, we show that optimal cost/reward can be approximated up to a constant additive factor for the general problem.
Keywords
• Markov decision processes
• Infinite state systems
• Multi-objective optimization

Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

References

1. R. Alur, L. D'Antoni, J. Deshmukh, M. Raghothaman, and Y. Yuan. Regular functions and cost register automata. In Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science, pages 13-22, 2013.
2. R. Alur and M. Raghothaman. Decision problems for additive regular functions. In Automata, Languages, and Programming - 40th International Colloquium, ICALP, Part II, pages 37-48, 2013.
3. R. Bellman. A Markovian decision process. Journal of Mathematics and Mechanics, 6:679-684, 1957.
4. D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems. Math. Oper. Res., 16(3):580-595, August 1991.
5. T. Brázdil, V. Brožek, K. Etessami, and A. Kučera. Approximating the termination value of one-counter MDPs and stochastic games. In International Colloquium on Automata, Languages, and Programming, pages 332-343, 2011.
6. T. Brázdil, V. Brožek, K. Etessami, A. Kučera, and D. Wojtczak. One-counter Markov decision processes. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 863-874, 2010.
7. T. Brázdil, J. Esparza, S. Kiefer, and A. Kučera. Analyzing probabilistic pushdown automata. Form. Methods Syst. Des., 43(2):124-163, October 2013.
8. T. Brázdil, A. Kučera, P. Novotný, and D. Wojtczak. Minimizing expected termination time in one-counter Markov decision processes. In Automata, Languages, and Programming - 38th ICALP, Part II, pages 141-152, 2012.
9. J. Esparza, A. Kučera, and R. Mayr. Quantitative analysis of probabilistic pushdown automata: expectations and variances. In Proceedings of the 2005 20th Annual IEEE Symposium on Logic in Computer Science, pages 117-126, 2005.
10. E. A. Feinberg and A. Shwartz. Handbook of Markov decision processes: methods and applications, volume 40. Springer Science &Business Media, 2012.
11. M. Kwiatkowska. Quantitative verification: Models, techniques and tools. In Proc. ACM SIGSOFT Symp. on Foundations of Software Engineering, pages 449-458, 2007.
12. D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley. A survey of multi-objective sequential decision-making. J. Artif. Int. Res., 48(1):67-113, October 2013.