Reinforcement Planning for Effective ε-Optimal Policies in Dense Time with Discontinuities

Authors Léo Henry , Blaise Genest , Alexandre Drewery



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2023.13.pdf
  • Filesize: 0.87 MB
  • 18 pages

Document Identifiers

Author Details

Léo Henry
  • University College London, UK
Blaise Genest
  • CNRS and CNRS@CREATE, IPAL, France
  • Institute for Infocomm Research (I2R), Singapore
Alexandre Drewery
  • ENS Rennes, France

Cite AsGet BibTex

Léo Henry, Blaise Genest, and Alexandre Drewery. Reinforcement Planning for Effective ε-Optimal Policies in Dense Time with Discontinuities. In 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 284, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.FSTTCS.2023.13

Abstract

Lately, the model of (Decision) Stochastic Timed Automata (DSTA) has been proposed, to model those Cyber Physical Systems displaying dense time (physical part), discrete actions and discontinuities such as timeouts (cyber part). The state of the art results on controlling DSTAs are however not ideal: in the case of infinite horizon, optimal controllers do not exist, while for timed bounded behaviors, we do not know how to build such controllers, even ε-optimal ones. In this paper, we develop a theory of Reinforcement Planning in the setting of DSTAs, for discounted infinite horizon objectives. We show that optimal controllers do exist in general. Further, for DSTAs with 1 clock (which already generalize Continuous Time MDPs with e.g. timeouts), we provide an effective procedure to compute ε-optimal controllers. It is worth noting that we do not rely on the discretization of the time space, but consider symbolic representations instead. Evaluation on a DSTA shows that this method can be more efficient. Last, we show on a counterexample that this is the furthest this construction can go, as it cannot be extended to 2 or more clocks.

Subject Classification

ACM Subject Classification
  • Theory of computation → Timed and hybrid models
Keywords
  • reinforcement planning
  • timed automata
  • planning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rajeev Alur and David L. Dill. A theory of timed automata. Theoretical Computer Science, 126(2):183-235, April 1994. URL: https://doi.org/10.1007/BFb0031987.
  2. Christel Baier, Nathalie Bertrand, Patricia Bouyer, Thomas Brihaye, and Marcus Größer. Probabilistic and topological semantics for timed automata. In FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science: 27th International Conference, New Delhi, India, December 12-14, 2007. Proceedings, pages 179-191, Berlin, Heidelberg, 2023. Springer-Verlag. URL: https://doi.org/10.1007/978-3-540-77050-3_15.
  3. Christel Baier, Holger Hermanns, Joost-Pieter Katoen, and Boudewijn R. Haverkort. Efficient computation of time-bounded reachability probabilities in uniform continuous-time markov decision processes. Theoretical Computer Science, 345(1):2-26, 2005. Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2004). URL: https://doi.org/10.1016/j.tcs.2005.07.022.
  4. Nathalie Bertrand, Thomas Brihaye, and Blaise Genest. Deciding the value 1 problem for reachability in 1-clock decision stochastic timed automata. In Gethin Norman and William Sanders, editors, Quantitative Evaluation of Systems, pages 313-328. Springer International Publishing, 2014. URL: https://doi.org/10.1007/978-3-319-10696-0_25.
  5. Nathalie Bertrand and Sven Schewe. Playing optimally on timed automata with random delays. In Marcin Jurdziński and Dejan Ničković, editors, Formal Modeling and Analysis of Timed Systems, pages 43-58, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-642-33365-1_5.
  6. Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientific, 3rd edition, 2007. Google Scholar
  7. Steven Bradtke and Michael Duff. Reinforcement learning methods for continuous-time markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7. MIT Press, 1994. URL: https://proceedings.neurips.cc/paper_files/paper/1994/file/07871915a8107172b3b5dc15a6574ad3-Paper.pdf.
  8. Tomas Brazdil, Vojtech Forejt, Jan Krcal, Jan Kretinsky, and Antonin Kucera. Continuous-Time Stochastic Games with Time-Bounded Reachability. In Ravi Kannan and K. Narayan Kumar, editors, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, volume 4 of Leibniz International Proceedings in Informatics (LIPIcs), pages 61-72, Dagstuhl, Germany, 2009. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2009.2307.
  9. Taolue Chen, Tingting Han, Joost-Pieter Katoen, and Alexandru Mereacre. Reachability probabilities in markovian timed automata. In 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC’11), pages 7075-7080, 2011. URL: https://doi.org/10.1109/CDC.2011.6160992.
  10. Kenji Doya. Reinforcement Learning in Continuous Time and Space. Neural Computation, 12(1):219-245, January 2000. URL: https://doi.org/10.1162/089976600300015961.
  11. Olga Grinchtein. Learning of Timed Systems. PhD thesis, Uppsala University, Sweden, 2008. URL: http://nbn-resolving.de/urn:nbn:se:uu:diva-8763.
  12. Xianping Guo and On'esimo Hernandez-Lerma. Continuous-time markov decision processes. Theory and Applications, 2009. URL: https://doi.org/10.1007/978-3-642-02547-1.
  13. Léo Henry, Blaise Genest, and Alexandre Drewery. Reinforcement planning for effective ε-optimal policies in dense time with discontinuities. Technical report, CNRS, 2023. URL: http://perso.crans.org/genest/HGD23.pdf.
  14. Léo Henry, Thierry Jéron, and Nicolas Markey. Active learning of timed automata with unobservable resets. In Nathalie Bertrand and Nils Jansen, editors, 18th International Conferences on Formal Modelling and Analysis of Timed Systems (FORMATS'20), volume 12288 of Lecture Notes in Computer Science, pages 144-160. Springer, September 2020. URL: https://doi.org/10.1007/978-3-030-57628-8_9.
  15. Rupak Majumdar, Mahmoud Salamati, and Sadegh Soudjani. On Decidability of Time-Bounded Reachability in CTMDPs. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 133:1-133:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.133.
  16. Markus N. Rabe and Sven Schewe. Optimal time-abstract schedulers for ctmdps and continuous-time markov games. Theoretical Computer Science, 467:53-67, 2013. URL: https://doi.org/10.1016/j.tcs.2012.10.001.
  17. C. Ridders. A new algorithm for computing a single root of a real continuous function. IEEE Transactions on Circuits and Systems, 26(11):979-980, 1979. URL: https://doi.org/10.1109/TCS.1979.1084580.
  18. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018. Google Scholar
  19. Çagatay Yildiz, Markus Heinonen, and Harri Lähdesmäki. Continuous-time model-based reinforcement learning. In International Conference on Machine Learning, 2021. URL: https://api.semanticscholar.org/CorpusID:231855323.