Abstraction-Based Decision Making for Statistical Properties (Invited Talk)

Authors Filip Cano , Thomas A. Henzinger , Bettina Könighofer , Konstantin Kueffner , Kaushik Mallik



PDF
Thumbnail PDF

File

LIPIcs.FSCD.2024.2.pdf
  • Filesize: 1.32 MB
  • 17 pages

Document Identifiers

Author Details

Filip Cano
  • Graz University of Technology, Austria
Thomas A. Henzinger
  • Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
Bettina Könighofer
  • Graz University of Technology, Austria
Konstantin Kueffner
  • Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
Kaushik Mallik
  • Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria

Cite AsGet BibTex

Filip Cano, Thomas A. Henzinger, Bettina Könighofer, Konstantin Kueffner, and Kaushik Mallik. Abstraction-Based Decision Making for Statistical Properties (Invited Talk). In 9th International Conference on Formal Structures for Computation and Deduction (FSCD 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 299, pp. 2:1-2:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.FSCD.2024.2

Abstract

Sequential decision-making in probabilistic environments is a fundamental problem with many applications in AI and economics. In this paper, we present an algorithm for synthesizing sequential decision-making agents that optimize statistical properties such as maximum and average response times. In the general setting of sequential decision-making, the environment is modeled as a random process that generates inputs. The agent responds to each input, aiming to maximize rewards and minimize costs within a specified time horizon. The corresponding synthesis problem is known to be PSPACE-hard. We consider the special case where the input distribution, reward, and cost depend on input-output statistics specified by counter automata. For such problems, this paper presents the first PTIME synthesis algorithms. We introduce the notion of statistical abstraction, which clusters statistically indistinguishable input-output sequences into equivalence classes. This abstraction allows for a dynamic programming algorithm whose complexity grows polynomially with the considered horizon, making the statistical case exponentially more efficient than the general case. We evaluate our algorithm on three different application scenarios of a client-server protocol, where multiple clients compete via bidding to gain access to the service offered by the server. The synthesized policies optimize profit while guaranteeing that none of the server’s clients is disproportionately starved of the service.

Subject Classification

ACM Subject Classification
  • Theory of computation → Online algorithms
  • Theory of computation → Computational pricing and auctions
  • Theory of computation → Abstraction
Keywords
  • Abstract interpretation
  • Sequential decision making
  • Counter machines

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stefan Ankirchner, Maike Klein, and Thomas Kruse. A verification theorem for optimal stopping problems with expectation constraints. Applied Mathematics & Optimization, 79:145-177, 2019. Google Scholar
  2. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-256, 2002. Google Scholar
  3. Elena Bandini, Andrea Cosso, Marco Fuhrman, and Huyên Pham. Backward SDEs for optimal control of partially observed path-dependent stochastic systems: a control randomization approach. The Annals of Applied Probability, 28(3):1634-1678, 2018. Google Scholar
  4. Erhan Bayraktar and Song Yao. Optimal stopping with expectation constraints. The Annals of Applied Probability, 34(1B):917-959, 2024. Google Scholar
  5. Yonathan Efroni, Shie Mannor, and Matteo Pirotta. Exploration-exploitation in constrained MDPs. arXiv preprint, 2020. URL: https://arxiv.org/abs/2003.02189.
  6. Thomas Ferrère, Thomas A Henzinger, and Bernhard Kragl. Monitoring event frequencies. In 28th EACSL Annual Conference on Computer Science Logic (CSL). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  7. Abhijit Gosavi et al. Simulation-based optimization. Springer, 2015. Google Scholar
  8. Sigrid Källblad. A dynamic programming approach to distribution-constrained optimal stopping. The Annals of Applied Probability, 32(3):1902-1928, 2022. Google Scholar
  9. Robert Kleinberg and S Matthew Weinberg. Matroid prophet inequalities and applications to multi-dimensional mechanism design. Games and Economic Behavior, 113:97-115, 2019. Google Scholar
  10. S. Muthukrishnan. Ad exchanges: Research issues. In Internet and Network Economics, 5th International Workshop (WINE), pages 1-12, 2009. Google Scholar
  11. Aaron Zeff Palmer and Alexander Vladimirsky. Optimal stopping with a probabilistic constraint. Journal of Optimization Theory and Applications, 175:795-817, 2017. Google Scholar
  12. Christos H. Papadimitriou. Games against nature. Journal of Computer and System Sciences, 31(2):288-301, 1985. Google Scholar
  13. David C Parkes and Satinder Singh. An MDP-based approach to online mechanism design. Advances in neural information processing systems (NIPS), 16, 2003. Google Scholar
  14. Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67-113, 2013. Google Scholar
  15. Albert N Shiryaev. Optimal stopping rules, volume 8. Springer Science & Business Media, 2007. Google Scholar
  16. Daniel D. Sleator and Robert E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28(2):202-208, 1985. Google Scholar
  17. Alexander L Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L Littman. PAC model-free reinforcement learning. In International Conference on Machine Mearning (ICML), pages 881-888, 2006. Google Scholar
  18. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. Google Scholar
  19. Andrew J Wagenmaker, Yifang Chen, Max Simchowitz, Simon Du, and Kevin Jamieson. First-order regret in reinforcement learning with linear function approximation: A robust estimation approach. In International Conference on Machine Learning (ICML), pages 22384-22429, 2022. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail