Markov Decision Processes and Stochastic Games with Total Effective Payoff

Boros, Endre; Elbassioni, Khaled; Gurvich, Vladimir; Makino, Kazuhisa

doi:10.4230/LIPIcs.STACS.2015.103

File

Author Details

Endre Boros

Khaled Elbassioni

Vladimir Gurvich

Kazuhisa Makino

Cite As Get BibTex

Endre Boros, Khaled Elbassioni, Vladimir Gurvich, and Kazuhisa Makino. Markov Decision Processes and Stochastic Games with Total Effective Payoff. In 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 30, pp. 103-115, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.STACS.2015.103

Abstract

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.

Subject Classification

Keywords

Markov decision processes
undiscounted stochastic games
linear programming
mean payoff
total payoff

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

D. P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1987.
D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580-595, 1991.
D. P. Bertsekas and H. Yuz. Stochastic shortest path problems, under weak conditions, lids report 2909. Technical report, MIT, 2013.
D. Blackwell. Discrete dynamic programming. Ann. Math. Statist., 33:719-726, 1962.
E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In Proc. 14th IPCO, volume 6080 of LNCS, pages 341-354. Springer, 2010.
E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. On canonical forms for zero-sum stochastic mean payoff games. Dynamic Games and Applications, 3(2):128-161, 2013.
C. Derman. Finite State Markov decision processes. Academic Press, New York and London, 1970.
J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, Berlin, 1996.
O. Friedmann, T. D. Hansen, and U. Zwick. Subexponential lower bounds for randomized pivoting rules for the simplex algorithm. In STOC, pages 283-292, 2011.
D.R. Fulkerson and G.C. Harding. Maximizing the minimum source-sink path subject to a budget constraint. Mathematical Programming, 13:116-118, 1977.
T. Gallai. Maximum-minimum Sätze über Graphen. Acta Mathematica Academiae Scientiarum Hungaricae, 9:395-434, 1958.
D. Gillette. Stochastic games with zero stop probabilities. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contribution to the Theory of Games III, volume 39 of Annals of Mathematics Studies, pages 179-187. Princeton University Press, 1957.
V.A. Gurvich, A.V. Karzanov, and L.G. Khachiyan. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys., 28:85-91, 1988.
O. O. Hernández-Lerma and J.-B. Lasserre. Further topics on discrete-time Markov control processes. Applications of mathematics. Springer, New York, 1999.
A. J. Hoffman and R. M. Karp. On non-terminating stochastic games. Management Science, 12:359-370, 1966.
R. A. Howard. Dynamic programming and Markov processes. Technology press and Willey, New York, 1960.
E. Israeli and R. K. Wood. Shortest-path network interdiction. Networks, 40(2):97-111, 2002.
R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Math., 23:309-311, 1978.
A. V. Karzanov and V. N. Lebedev. Cyclical games with prohibition. Mathematical Programming, 60:277-293, 1993.
L. Khachiyan, E. Boros, K. Borys, K. Elbassioni, V. Gurvich, G. Rudolf, and J. Zhao. On short paths interdiction problems: Total and node-wise limited interdiction. Theory Comput. Syst., 43(2):204-233, 2008.
L. Khachiyan, V. Gurvich, and J. Zhao. Extending dijkstra’s algorithm to maximize the shortest path by node-wise limited arc interdiction. In CSR, pages 221-234, 2006.
T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time-average payoff. SIAM Review, 4:604-607, 1969.
H. Mine and S. Osaki. Markovian decision process. American Elsevier Publishing Co., New York, 1970.
R. H. Möhring, M. Skutella, and F. Stork. Scheduling with and/or precedence constraints. SIAM J. Comput., 33(2):393-415, 2004.
S. D. Patek and D. P. Bertsekas. Stochastic shortest path games. SIAM Journal on Control and Optimization, 37:804-824, 1997.
L. Shapley. Stochastic games. Proc. Nat. Acad. Sci. USA, 39:1095-1100, 1953.
F. Thuijsman and O. J. Vrieze. The bad match, a total reward stochastic game. Operations Research Spektrum, 9:93-99, 1987.
F. Thuijsman and O. J. Vrieze. Total reward stochastic games and sensitive average reward strategies. Journal of Optimization Theory and Applications, 98:175-196, 1998.
P. Whittle. Optimization over Time. John Wiley & Sons, Inc., New York, NY, USA, 1982.
H. Yu and D. P. Bertsekas. Q-learning and policy iteration algorithms for stochastic shortest path problems. Annals OR, 208(1):95-132, 2013.
H. Yuz. Stochastic shortest path games and q-learning, lids report 2875. Technical report, MIT, 2011.
U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343 - 359, 1996.

Markov Decision Processes and Stochastic Games with Total Effective Payoff

Authors Endre Boros, Khaled Elbassioni, Vladimir Gurvich, Kazuhisa Makino

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message