The Operator Approach to Entropy Games

Akian, Marianne; Gaubert, Stéphane; Grand-Clément, Julien; Guillaud, Jérémie

doi:10.4230/LIPIcs.STACS.2017.6

File

Subject Classification

Keywords

Stochastic games
Shapley operators
policy iteration
Perron eigenvalues
Risk sensitive control

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Entropy games and matrix multiplication games have been recently introduced by Asarin et al. They model the situation in which one player (Despot) wishes to minimize the growth rate of a matrix product, whereas the other player (Tribune) wishes to maximize it. We develop an operator approach to entropy games. This allows us to show that entropy games can be cast as stochastic mean payoff games in which some action spaces are simplices and payments are given by a relative entropy (Kullback-Leibler divergence). In this way, we show that entropy games with a fixed number of states belonging to Despot can be solved in polynomial time. This approach also allows us to solve these games by a policy iteration algorithm, which we compare with the spectral simplex algorithm developed by Protasov.

Cite As Get BibTex

Marianne Akian, Stéphane Gaubert, Julien Grand-Clément, and Jérémie Guillaud. The Operator Approach to Entropy Games. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 66, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.STACS.2017.6

Author Details

Marianne Akian

Stéphane Gaubert

Julien Grand-Clément

Jérémie Guillaud

References

M. Akian, S. Gaubert, and A. Guterman. Tropical polyhedra are equivalent to mean payoff games. International Journal of Algebra and Computation, 22(1):125001 (43 pages), 2012. http://arxiv.org/abs/0912.2462, URL: http://dx.doi.org/10.1142/S0218196711006674.
M. Akian, S. Gaubert, and R. Nussbaum. A Collatz-Wielandt characterization of the spectral radius of order-preserving homogeneous maps on cones, 2011. URL: https://arxiv.org/abs/1112.5968.
V. Anantharam and V. S. Borkar. A variational formula for risk-sensitive reward, 2015. arXiv:1501.00676.
D. Andersson and P. B. Miltersen. The complexity of solving stochastic games on graphs. In Proceedings of ISAAC'09, number 5878 in LNCS. Springer, 2009.
E. Asarin, J. Cervelle, A. Degorre, C. Dima, F. Horn, and V. Kozyakin. Entropy games and matrix multiplication games. In 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17-20, 2016, Orléans, France, pages 11:1-11:14, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2016.11.
A. Berman and R. J. Plemmons. Nonnegative matrices in the mathematical sciences. Academic Press, 1994.
V. D. Blondel and Y. Nesterov. Polynomial-time computation of the joint spectral radius for some sets of nonnegative matrices. SIAM J. Matrix Anal., 31(3):865-876, 2009.
J. Bolte, S. Gaubert, and G. Vigeral. Definable zero-sum stochastic games. Mathematics of Operations Research, 40(1):171-191, 2014. http://arxiv.org/abs/1301.1967, URL: http://dx.doi.org/10.1287/moor.2014.0666.
J. M. Borwein and P. B. Borwein. On the complexity of familiar functions and numbers. SIAM Review, 30(4):589-601, 1988.
T. Chen and T. Han. On the complexity of computing maximum entropy for markovian models. In 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, December 15-17, 2014, New Delhi, India, pages 571-583, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2014.571.
M. D. Donsker and S. R. S. Varadhan. On a variational formula for the principal eigenvalue for operators with maximum principle. Proc. Nat. Acad. Sci. USA, 72(3):780-783, 1975.
L. van den Dries. Tame topology and o-minimal structures, volume 248 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, 1998. URL: http://dx.doi.org/10.1017/CBO9780511525919.
W. H. Fleming and D. Hernández-Hernández. Risk-sensitive control of finite state machines on an infinite horizon. I. SIAM J. Control Optim., 35(5):1790-1810, 1997. URL: http://dx.doi.org/10.1137/S0363012995291622.
W. H. Fleming and D. Hernández-Hernández. Risk-sensitive control of finite state machines on an infinite horizon. II. SIAM J. Control Optim., 37(4):1048-1069 (electronic), 1999. URL: http://dx.doi.org/10.1137/S0363012997321498.
S. Gaubert and J. Gunawardena. A non-linear hierarchy for discrete event dynamical systems. In Proc. of the Fourth Workshop on Discrete Event Systems (WODES98), pages 249-254, Cagliari, Italy, 1998. IEE.
S. Gaubert and J. Gunawardena. The Perron-Frobenius theorem for homogeneous, monotone functions. Trans. of AMS, 356(12):4931-4950, 2004. http://arxiv.org/abs/math.FA/0105091, URL: http://dx.doi.org/10.1090/S0002-9947-04-03470-1.
M. Grötschel, L. Lovász, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169-197, 1981.
T. D. Hansen, P. B. Miltersen, and U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. In Innovations in Computer Science 2011, pages 253-263. Tsinghua University Press, 2011.
A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Management Science. Journal of the Institute of Management Science. Application and Theory Series, 12:359-370, 1966.
V. Kozyakin. Hourglass alternative and the finiteness conjecture for the spectral characteristics of sets of non-negative matrices. arXiv:1507.00492, 2015.
M. Lothaire. Applied Combinatorics on Words. Cambridge, 2005.
R. D. Nussbaum. Convexity and log convexity for the spectral radius. Linear Algebra Appl., 73:59-122, 1986.
V. Yu. Protasov. Spectral simplex method. Mathematical Programming, 2015. URL: http://dx.doi.org/10.1007/s10107-015-0905-2.
M. L. Puterman. Markov Decision Processes. Wiley, 2005.
R. T. Rockafellar and R. J.-B. Wets. Variational analysis. Springer-Verlag, Berlin, 1998. URL: http://dx.doi.org/10.1007/978-3-642-02431-3.
U. G. Rothblum. Multiplicative markov decision chains. Mathematics of Operations Research, 9(1):6-24, 1984.
S. M. Rump. Polynomial minimum root separation. Mathematics of Computation, 145(33):327-336, 1979.
L. van den Dries. o-minimal structures and real analytic geometry. In Current developments in mathematics, 1998 (Cambridge, MA), pages 105-152. Int. Press, Somerville, MA, 1999.
A. J. Wilkie. Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. J. Amer. Math. Soc., 9(4):1051-1094, 1996. URL: http://dx.doi.org/10.1090/S0894-0347-96-00216-0.
Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate, 2011. URL: http://dx.doi.org/10.1287/moor.1110.0516.

The Operator Approach to Entropy Games

Authors Marianne Akian, Stéphane Gaubert, Julien Grand-Clément, Jérémie Guillaud

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message