Markov Decision Processes and Stochastic Games with Total Effective Payoff

Authors Endre Boros, Khaled Elbassioni, Vladimir Gurvich, Kazuhisa Makino



PDF
Thumbnail PDF

File

LIPIcs.STACS.2015.103.pdf
  • Filesize: 0.65 MB
  • 13 pages

Document Identifiers

Author Details

Endre Boros
Khaled Elbassioni
Vladimir Gurvich
Kazuhisa Makino

Cite AsGet BibTex

Endre Boros, Khaled Elbassioni, Vladimir Gurvich, and Kazuhisa Makino. Markov Decision Processes and Stochastic Games with Total Effective Payoff. In 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 30, pp. 103-115, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)
https://doi.org/10.4230/LIPIcs.STACS.2015.103

Abstract

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.
Keywords
  • Markov decision processes
  • undiscounted stochastic games
  • linear programming
  • mean payoff
  • total payoff

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. D. P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1987. Google Scholar
  2. D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580-595, 1991. Google Scholar
  3. D. P. Bertsekas and H. Yuz. Stochastic shortest path problems, under weak conditions, lids report 2909. Technical report, MIT, 2013. Google Scholar
  4. D. Blackwell. Discrete dynamic programming. Ann. Math. Statist., 33:719-726, 1962. Google Scholar
  5. E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In Proc. 14th IPCO, volume 6080 of LNCS, pages 341-354. Springer, 2010. Google Scholar
  6. E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. On canonical forms for zero-sum stochastic mean payoff games. Dynamic Games and Applications, 3(2):128-161, 2013. Google Scholar
  7. C. Derman. Finite State Markov decision processes. Academic Press, New York and London, 1970. Google Scholar
  8. J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, Berlin, 1996. Google Scholar
  9. O. Friedmann, T. D. Hansen, and U. Zwick. Subexponential lower bounds for randomized pivoting rules for the simplex algorithm. In STOC, pages 283-292, 2011. Google Scholar
  10. D.R. Fulkerson and G.C. Harding. Maximizing the minimum source-sink path subject to a budget constraint. Mathematical Programming, 13:116-118, 1977. Google Scholar
  11. T. Gallai. Maximum-minimum Sätze über Graphen. Acta Mathematica Academiae Scientiarum Hungaricae, 9:395-434, 1958. Google Scholar
  12. D. Gillette. Stochastic games with zero stop probabilities. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contribution to the Theory of Games III, volume 39 of Annals of Mathematics Studies, pages 179-187. Princeton University Press, 1957. Google Scholar
  13. V.A. Gurvich, A.V. Karzanov, and L.G. Khachiyan. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys., 28:85-91, 1988. Google Scholar
  14. O. O. Hernández-Lerma and J.-B. Lasserre. Further topics on discrete-time Markov control processes. Applications of mathematics. Springer, New York, 1999. Google Scholar
  15. A. J. Hoffman and R. M. Karp. On non-terminating stochastic games. Management Science, 12:359-370, 1966. Google Scholar
  16. R. A. Howard. Dynamic programming and Markov processes. Technology press and Willey, New York, 1960. Google Scholar
  17. E. Israeli and R. K. Wood. Shortest-path network interdiction. Networks, 40(2):97-111, 2002. Google Scholar
  18. R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Math., 23:309-311, 1978. Google Scholar
  19. A. V. Karzanov and V. N. Lebedev. Cyclical games with prohibition. Mathematical Programming, 60:277-293, 1993. Google Scholar
  20. L. Khachiyan, E. Boros, K. Borys, K. Elbassioni, V. Gurvich, G. Rudolf, and J. Zhao. On short paths interdiction problems: Total and node-wise limited interdiction. Theory Comput. Syst., 43(2):204-233, 2008. Google Scholar
  21. L. Khachiyan, V. Gurvich, and J. Zhao. Extending dijkstra’s algorithm to maximize the shortest path by node-wise limited arc interdiction. In CSR, pages 221-234, 2006. Google Scholar
  22. T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time-average payoff. SIAM Review, 4:604-607, 1969. Google Scholar
  23. H. Mine and S. Osaki. Markovian decision process. American Elsevier Publishing Co., New York, 1970. Google Scholar
  24. R. H. Möhring, M. Skutella, and F. Stork. Scheduling with and/or precedence constraints. SIAM J. Comput., 33(2):393-415, 2004. Google Scholar
  25. S. D. Patek and D. P. Bertsekas. Stochastic shortest path games. SIAM Journal on Control and Optimization, 37:804-824, 1997. Google Scholar
  26. L. Shapley. Stochastic games. Proc. Nat. Acad. Sci. USA, 39:1095-1100, 1953. Google Scholar
  27. F. Thuijsman and O. J. Vrieze. The bad match, a total reward stochastic game. Operations Research Spektrum, 9:93-99, 1987. Google Scholar
  28. F. Thuijsman and O. J. Vrieze. Total reward stochastic games and sensitive average reward strategies. Journal of Optimization Theory and Applications, 98:175-196, 1998. Google Scholar
  29. P. Whittle. Optimization over Time. John Wiley & Sons, Inc., New York, NY, USA, 1982. Google Scholar
  30. H. Yu and D. P. Bertsekas. Q-learning and policy iteration algorithms for stochastic shortest path problems. Annals OR, 208(1):95-132, 2013. Google Scholar
  31. H. Yuz. Stochastic shortest path games and q-learning, lids report 2875. Technical report, MIT, 2011. Google Scholar
  32. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343 - 359, 1996. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail