Document

# Discounted-Sum Automata with Multiple Discount Factors

## File

LIPIcs.CSL.2021.12.pdf
• Filesize: 0.51 MB
• 23 pages

## Cite As

Udi Boker and Guy Hefetz. Discounted-Sum Automata with Multiple Discount Factors. In 29th EACSL Annual Conference on Computer Science Logic (CSL 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 183, pp. 12:1-12:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CSL.2021.12

## Abstract

Discounting the influence of future events is a key paradigm in economics and it is widely used in computer-science models, such as games, Markov decision processes (MDPs), reinforcement learning, and automata. While a single game or MDP may allow for several different discount factors, discounted-sum automata (NDAs) were only studied with respect to a single discount factor. For every integer λ ∈ ℕ⧵{0,1}, as opposed to every λ ∈ ℚ⧵ℕ, the class of NDAs with discount factor λ (λ-NDAs) has good computational properties: it is closed under determinization and under the algebraic operations min, max, addition, and subtraction, and there are algorithms for its basic decision problems, such as automata equivalence and containment. We define and analyze discounted-sum automata in which each transition can have a different integral discount factor (integral NMDAs). We show that integral NMDAs with an arbitrary choice of discount factors are not closed under determinization and under algebraic operations. We then define and analyze a restricted class of integral NMDAs, which we call tidy NMDAs, in which the choice of discount factors depends on the prefix of the word read so far. Tidy NMDAs are as expressive as deterministic integral NMDAs with an arbitrary choice of discount factors, and some of their special cases are NMDAs in which the discount factor depends on the action (alphabet letter) or on the elapsed time. We show that for every function θ that defines the choice of discount factors, the class of θ-NMDAs enjoys all of the above good properties of integral NDAs, as well as the same complexities of the required decision problems. To this end, we also improve the previously known complexities of the decision problems of integral NDAs, and present tight bounds on the size blow-up involved in algebraic operations on them. All our results hold equally for automata on finite words and for automata on infinite words.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Formal languages and automata theory
##### Keywords
• Automata
• Discounted-sum
• Quantitative verification
• NMDA
• NDA

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Shaull Almagor, Udi Boker, and Orna Kupferman. Discounting in LTL. In proceedings of TACAS, volume 8413 of LNCS, pages 424-439, 2014. URL: https://doi.org/10.1007/978-3-642-54862-8_37.
2. Daniel Andersson. An improved algorithm for discounted payoff games. In proceedings of ESSLLI Student Session, pages 91-98, 2006.
3. Suguman Bansal, Swarat Chaudhuri, and Moshe Y. Vardi. Comparator automata in quantitative verification. In proceedings of FoSSaCS, volume 10803 of LNCS, pages 420-437, 2018. URL: https://doi.org/10.1007/978-3-319-89366-2_23.
4. Udi Boker, Krishnendu Chatterjee, Thomas A. Henzinger, and Orna Kupferman. Temporal specifications with accumulative values. ACM Trans. Comput. Log., 15(4):27:1-27:25, 2014. URL: https://doi.org/10.1145/2629686.
5. Udi Boker and Thomas A. Henzinger. Approximate determinization of quantitative automata. In proceedings of FSTTCS, volume 18 of LIPIcs, pages 362-373, 2012. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2012.362.
6. Udi Boker and Thomas A. Henzinger. Exact and approximate determinization of discounted-sum automata. Log. Methods Comput. Sci., 10(1), 2014. URL: https://doi.org/10.2168/LMCS-10(1:10)2014.
7. Udi Boker, Thomas A. Henzinger, and Jan Otop. The target discounted-sum problem. In proceedings of LICS, pages 750-761, 2015. URL: https://doi.org/10.1109/LICS.2015.74.
8. Udi Boker, Orna Kupferman, and Michal Skrzypczak. How deterministic are good-for-games automata? In proceedings of FSTTCS, volume 93 of LIPIcs, pages 18:1-18:14, 2017. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2017.18.
9. Krishnendu Chatterjee. Markov decision processes with multiple long-run average objectives. In proceedings of FSTTCS, volume 4855 of LNCS, pages 473-484. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-77050-3_39.
10. Krishnendu Chatterjee, Laurent Doyen, Herbert Edelsbrunner, Thomas A. Henzinger, and Philippe Rannou. Mean-payoff automaton expressions. In proceedings of CONCUR, volume 6269 of LNCS, pages 269-283, 2010. URL: https://doi.org/10.1007/978-3-642-15375-4_19.
11. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Alternating weighted automata. In proceedings of FCT, volume 5699 of LNCS, pages 3-13, 2009. URL: https://doi.org/10.1007/978-3-642-03409-1_2.
12. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Probabilistic weighted automata. In proceedings of CONCUR, volume 5710 of LNCS, pages 244-258, 2009. URL: https://doi.org/10.1007/978-3-642-04081-8_17.
13. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Expressiveness and closure properties for quantitative languages. Log. Methods Comput. Sci., 6(3), 2010. URL: http://arxiv.org/abs/1007.4018.
14. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Quantitative languages. ACM Trans. Comput. Log., 11(4):23:1-23:38, 2010. URL: https://doi.org/10.1145/1805950.1805953.
15. Krishnendu Chatterjee, Vojtech Forejt, and Dominik Wojtczak. Multi-objective discounted reward verification in graphs and mdps. In proceedings of LPAR, volume 8312 of LNCS, pages 228-242, 2013. URL: https://doi.org/10.1007/978-3-642-45221-5_17.
16. Luca de Alfaro, Marco Faella, Thomas A. Henzinger, Rupak Majumdar, and Mariëlle Stoelinga. Model checking discounted temporal properties. Theor. Comput. Sci., 345(1):139-170, 2005. URL: https://doi.org/10.1016/j.tcs.2005.07.033.
17. Luca de Alfaro, Thomas A. Henzinger, and Rupak Majumdar. Discounting the future in systems theory. In proceedings of ICALP, volume 2719, pages 1022-1037, 2003. URL: https://doi.org/10.1007/3-540-45061-0_79.
18. Aldric Degorre, Laurent Doyen, Raffaella Gentilini, Jean-François Raskin, and Szymon Torunczyk. Energy and mean-payoff games with imperfect information. In proceedings of CSL, volume 6247 of LNCS, pages 260-274, 2010. URL: https://doi.org/10.1007/978-3-642-15205-4_22.
19. Manfred Droste and Dietrich Kuske. Skew and infinitary formal power series. Theor. Comput. Sci., 366(3):199-227, 2006. URL: https://doi.org/10.1016/j.tcs.2006.08.024.
20. Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. Finite-valued weighted automata. In proceedings of FSTTCS, volume 29 of LIPIcs, pages 133-145, 2014. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2014.133.
21. Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. Quantitative languages defined by functional automata. Log. Methods Comput. Sci., 11(3), 2015. URL: https://doi.org/10.2168/LMCS-11(3:14)2015.
22. Vincent François-Lavet, Raphaël Fonteneau, and Damien Ernst. How to discount deep reinforcement learning: Towards new dynamic strategies. CoRR, 2015. URL: http://arxiv.org/abs/1512.02011.
23. Hugo Gimbert and Wieslaw Zielonka. Limits of multi-discounted markov decision processes. In proceedings of LICS, pages 89-98, 2007. URL: https://doi.org/10.1109/LICS.2007.28.
24. Guy Hefetz. Discounted-sum automata with multiple discount factors. Master’s thesis, IDC, Herzliya, Israel, 2020. URL: https://www.idc.ac.il/en/schools/cs/research/documents/thesis-guy-hefetz.pdf.
25. Thomas A. Henzinger and Nir Piterman. Solving games without determinization. In proceedings of CSL, volume 4207 of LNCS, pages 395-410, 2006. URL: https://doi.org/10.1007/11874683_26.
26. Galina Jirásková. State complexity of some operations on binary regular languages. Theor. Comput. Sci., 330(2):287-298, 2005. URL: https://doi.org/10.1016/j.tcs.2004.04.011.
27. Yafim Kazak, Clark W. Barrett, Guy Katz, and Michael Schapira. Verifying deep-rl-driven systems. In proceedings of NetAI@SIGCOMM, pages 83-89, 2019. URL: https://doi.org/10.1145/3341216.3342218.
28. Tor Lattimore and Marcus Hutter. Time consistent discounting. In proceedings of ALT, volume 6925 of LNCS, pages 383-397, 2011. URL: https://doi.org/10.1007/978-3-642-24412-4_30.
29. Fernando Luque-Vásquez and J. Adolfo Minjárez-Sosa. Iteration Algorithms in Markov Decision Processes with State-Action-Dependent Discount Factors and Unbounded Costs, chapter 4, pages 55-69. Operations Research: the Art of Making Good Decisions. IntechOpen, 2017. URL: https://doi.org/10.5772/65044.
30. Omid Madani, Mikkel Thorup, and Uri Zwick. Discounted deterministic markov decision processes and discounted all-pairs shortest paths. ACM Trans. Algorithms, 6(2):33:1-33:25, 2010. URL: https://doi.org/10.1145/1721837.1721849.
31. Albert R. Meyer and Larry J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In proceedings of 13th IEEE Symp. on Switching and Automata Theory, pages 125-129, 1972. URL: https://doi.org/10.1109/SWAT.1972.29.
32. Chris Reinke, Eiji Uchibe, and Kenji Doya. Average reward optimization with multiple discounting reinforcement learners. In proceedings of ICONIP, volume 10634 of LNCS, pages 789-800, 2017. URL: https://doi.org/10.1007/978-3-319-70087-8_81.
33. William J. Sakoda and Michael Sipser. Nondeterminism and the size of two way finite automata. In proceedings of STOC, pages 275-286, 1978. URL: https://doi.org/10.1145/800133.804357.
34. Richard S. Sutton and Andrew G.Barto. Introduction to Reinforcement Learning. MIT Press, 1998. URL: http://dl.acm.org/doi/book/10.5555/551283.
35. Takashi Tomita, Shin Hiura, Shigeki Hagihara, and Naoki Yonezaki. A temporal logic with mean-payoff constraints. In proceedings of ICFEM, volume 7635 of LNCS, pages 249-265. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-34281-3_19.
36. Moshe Y. Vardi. Automatic verification of probabilistic concurrent finite-state programs. In proceedings of FOCS, pages 327-338, 1985. URL: https://doi.org/10.1109/SFCS.1985.12.
37. Yufei Wang, Qiwei Ye, and Tie-Yan Liu. Beyond exponentially discounted sum: Automatic learning of return function. CoRR, 2019. URL: http://arxiv.org/abs/1905.11591.
38. Xiao Wu and Xianping Guo. Convergence of Markov decision processes with constraints and state-action dependent discount factors. Sci. China Math., 63:167-182, 2020. URL: https://doi.org/10.1007/s11425-017-9292-1.
39. Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theor. Comput. Sci., 158:343-359, 1996. URL: https://doi.org/10.1016/0304-3975(95)00188-3.
X

Feedback for Dagstuhl Publishing