RAVEN: Reinforcement Learning for Generating Verifiable Run-Time Requirement Enforcers for MPSoCs

Authors Khalil Esper , Jan Spieck , Pierre-Louis Sixdenier , Stefan Wildermann , Jürgen Teich

Thumbnail PDF


  • Filesize: 0.87 MB
  • 16 pages

Document Identifiers

Author Details

Khalil Esper
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
Jan Spieck
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
Pierre-Louis Sixdenier
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
Stefan Wildermann
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
Jürgen Teich
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany

Cite AsGet BibTex

Khalil Esper, Jan Spieck, Pierre-Louis Sixdenier, Stefan Wildermann, and Jürgen Teich. RAVEN: Reinforcement Learning for Generating Verifiable Run-Time Requirement Enforcers for MPSoCs. In Fourth Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2023). Open Access Series in Informatics (OASIcs), Volume 108, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


In embedded systems, applications frequently have to meet non-functional requirements regarding, e.g., real-time or energy consumption constraints, when executing on a given MPSoC target platform. Feedback-based controllers have been proposed that react to transient environmental factors by adapting the DVFS settings or degree of parallelism following some predefined control strategy. However, it is, in general, not possible to give formal guarantees for the obtained controllers to satisfy a given set of non-functional requirements. Run-time requirement enforcement has emerged as a field of research for the enforcement of non-functional requirements at run-time, allowing to define and formally verify properties on respective control strategies specified by automata. However, techniques for the automatic generation of such controllers have not yet been established. In this paper, we propose a technique using reinforcement learning to automatically generate verifiable feedback-based enforcers. For that, we train a control policy based on a representative input sequence at design time. The learned control strategy is then transformed into a verifiable enforcement automaton which constitutes our run-time control model that can handle unseen input data. As a case study, we apply the approach to generate controllers that are able to increase the probability of satisfying a given set of requirement verification goals compared to multiple state-of-the-art approaches, as can be verified by model checkers.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Multicore architectures
  • Theory of computation → Linear logic
  • Theory of computation → Modal and temporal logics
  • Hardware → Finite state machines
  • Computer systems organization → Self-organizing autonomic computing
  • Theory of computation → Verification by model checking
  • Mathematics of computing → Probabilistic representations
  • Computing methodologies → Reinforcement learning
  • Verification
  • Runtime Requirement Enforcement
  • Reinforcement Learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Nidhi Anantharajaiah, Tamim Asfour, Michael Bader, Lars Bauer, Jürgen Becker, Simon Bischof, Marcel Brand, Hans-Joachim Bungartz, Christian Eichler, Khalil Esper, Joachim Falk, Nael Fasfous, Felix Freiling, Andreas Fried, Michael Gerndt, Michael Glaß, Jeferson Gonzalez, Frank Hannig, Christian Heidorn, Jörg Henkel, Andreas Herkersdorf, Benedict Herzog, Jophin John, Timo Hönig, Felix Hundhausen, Heba Khdr, Tobias Langer, Oliver Lenke, Fabian Lesniak, Alexander Lindermayr, Alexandra Listl, Sebastian Maier, Nicole Megow, Marcel Mettler, Daniel Müller-Gritschneder, Hassan Nassar, Fabian Paus, Alexander Pöppl, Behnaz Pourmohseni, Jonas Rabenstein, Phillip Raffeck, Martin Rapp, Santiago Narváez Rivas, Mark Sagi, Franziska Schirrmacher, Ulf Schlichtmann, Florian Schmaus, Wolfgang Schröder-Preikschat, Tobias Schwarzer, Mohammed Bakr Sikal, Bertrand Simon, Gregor Snelting, Jan Spieck, Akshay Srivatsa, Walter Stechele, Jürgen Teich, Isaías A. Comprés Ureña, Ingrid Verbauwhede, Dominik Walter, Thomas Wild, Stefan Wildermann, Mario Wille, Michael Witterauf, and Li Zhang. Invasive Computing. FAU University Press, 2022. Google Scholar
  2. Christel Baier, Boudewijn R. Haverkort, Holger Hermanns, and Joost-Pieter Katoen. On the Logical Characterisation of Performability Properties. In Automata, Languages and Programming, 27th International Colloquium, ICALP 2000, Geneva, Switzerland, July 9-15, 2000, Proceedings, volume 1853 of Lecture Notes in Computer Science, pages 780-792. Springer, 2000. Google Scholar
  3. Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. Verifiable Reinforcement Learning via Policy Extraction. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pages 2499-2509, Red Hook, NY, USA, 2018. Curran Associates Inc. Google Scholar
  4. Roderick Bloem, Bettina Könighofer, Robert Könighofer, and Chao Wang. Shield Synthesis. In International conference on tools and algorithms for the construction and analysis of systems, pages 533-548. Springer, 2015. Google Scholar
  5. Edmund M. Clarke and E. Allen Emerson. Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic. In Dexter Kozen, editor, Logics of Programs, Workshop, Yorktown Heights, New York, USA, May 1981, volume 131 of Lecture Notes in Computer Science, pages 52-71. Springer, 1981. Google Scholar
  6. Dakshina Dasari, Benny Akesson, Vincent Nélis, Muhammad Ali Awan, and Stefan M. Petters. Identifying the sources of unpredictability in cots-based multicore systems. In 8th IEEE International Symposium on Industrial Embedded Systems, SIES 2013, Porto, Portugal, June 19-21, 2013, pages 39-48. IEEE, 2013. Google Scholar
  7. Yvan Debizet, Guénolé Lallement, Fady Abouzeid, Philippe Roche, and Jean-Luc Autran. Q-Learning-based Adaptive Power Management for IoT System-on-Chips with Embedded Power States. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1-5, 2018. Google Scholar
  8. Christophe Dubach, Timothy M Jones, Edwin V Bonilla, Michael FP O'Boyle, et al. A Predictive Model for Dynamic Microarchitectural Adaptivity Control. In MICRO, volume 43, pages 485-496, 2010. Google Scholar
  9. Khalil Esper, Stefan Wildermann, and Jürgen Teich. A comparative evaluation of latency-aware energy optimization approaches in many-core systems. In Second Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Google Scholar
  10. Khalil Esper, Stefan Wildermann, and Jürgen Teich. Enforcement FSMs: Specification and Verification of Non-Functional Properties of Program Executions on MPSoCs. In Proceedings of the 19th ACM-IEEE International Conference on Formal Methods and Models for System Design, pages 21-31, 2021. Google Scholar
  11. Khalil Esper, Stefan Wildermann, and Jürgen Teich. Multi-Requirement Enforcement of Non-Functional Properties on MPSoCs Using Enforcement FSMs-A Case Study. In Third Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022. Google Scholar
  12. Yliès Falcone, Laurent Mounier, Jean-Claude Fernandez, and Jean-Luc Richier. Runtime Enforcement Monitors: Composition, Synthesis, and Enforcement Abilities. Formal Methods in System Design, 38(3):223-262, 2011. Google Scholar
  13. Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision Meets Robotics: The KITTI Dataset. The International Journal of Robotics Research, 32(11):1231-1237, 2013. Google Scholar
  14. Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous Deep Q-Learning with Model-Based Acceleration. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, pages 2829-2838. JMLR.org, 2016. Google Scholar
  15. Ujjwal Gupta, Sumit K. Mandal, Manqing Mao, Chaitali Chakrabarti, and Umit Y. Ogras. A Deep Q-Learning Approach for Dynamic Management of Heterogeneous Processors. IEEE Computer Architecture Letters, 18(1):14-17, 2019. Google Scholar
  16. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International conference on machine learning, pages 1861-1870. PMLR, 2018. Google Scholar
  17. Hans Hansson and Bengt Jonsson. A Logic for Reasoning about Time and Reliability. Formal Aspects of Computing, 6(5):512-535, 1994. Google Scholar
  18. Mohammadhosein Hasanbeig, Daniel Kroening, and Alessandro Abate. Towards verifiable and safe model-free reinforcement learning. In Nicola Gigante, Federico Mari, and Andrea Orlandini, editors, Proceedings of the 1st Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, co-located with the 18th International Conference of the Italian Association for Artificial Intelligence, OVERLAY@AI*IA 2019, Rende, Italy, November 19-20, 2019, volume 2509 of CEUR Workshop Proceedings, page 1. CEUR-WS.org, 2019. Google Scholar
  19. Connor Imes, David HK Kim, Martina Maggio, and Henry Hoffmann. POET: A Portable Approach to Minimizing Energy under Soft Real-Time Constraints. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, pages 75-86. IEEE, 2015. Google Scholar
  20. Peng Jin, Jiaxu Tian, Dapeng Zhi, Xuejun Wen, and Min Zhang. Trainify: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning. In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification, volume 13371, pages 193-218. Springer International Publishing, Cham, 2022. Series Title: Lecture Notes in Computer Science. URL: https://link.springer.com/10.1007/978-3-031-13185-1_10.
  21. David H. K. Kim, Connor Imes, and Henry Hoffmann. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics. In 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications, CPSNA 2015, Kowloon, Hong Kong, China, August 19-21, 2015, pages 78-85. IEEE Computer Society, 2015. Google Scholar
  22. Marta Z. Kwiatkowska, Gethin Norman, and David Parker. Quantitative Analysis With the Probabilistic Model Checker PRISM. Electron. Notes Theor. Comput. Sci., 153(2):5-31, 2006. Google Scholar
  23. Marta Z. Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of Probabilistic Real-Time Systems. In Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, volume 6806 of Lecture Notes in Computer Science, pages 585-591. Springer, 2011. Google Scholar
  24. David G Lowe. Object Recognition from Local Scale-Invariant Features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1150-1157. IEEE, 1999. Google Scholar
  25. Sumit K Mandal, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande, and Umit Y Ogras. An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms. ACM Transactions on Design Automation of Electronic Systems (TODAES), 25(3):1-26, 2020. Google Scholar
  26. Sumit K. Mandal, Ganapati Bhat, Chetan Arvind Patil, Janardhan Rao Doppa, Partha Pratim Pande, and Umit Y. Ogras. Dynamic Resource Management of Heterogeneous Mobile Platforms via Imitation Learning. IEEE Trans. Very Large Scale Integr. Syst., 27(12):2842-2854, 2019. Google Scholar
  27. Anway Mukherjee and Thidapat Chantem. Energy Management of Applications With Varying Resource Usage on Smartphones. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 37(11):2416-2427, 2018. Google Scholar
  28. Thannirmalai Somu Muthukaruppan, Mihai Pricopi, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. Hierarchical Power Management for Asymmetric Multi-Core in Dark Silicon Era. In The 50th Annual Design Automation Conference 2013, DAC '13, Austin, TX, USA, May 29 - June 07, 2013, pages 174:1-174:9. ACM, 2013. Google Scholar
  29. Behnaz Pourmohseni, Michael Glaß, Jörg Henkel, Heba Khdr, Martin Rapp, Valentina Richthammer, Tobias Schwarzer, Fedor Smirnov, Jan Spieck, Jürgen Teich, et al. Hybrid Application Mapping for Composable Many-Core Systems: Overview and Future Perspective. Journal of Low Power Electronics and Applications, 10(4):38, 2020. Google Scholar
  30. Fred B Schneider. Enforceable Security Policies. ACM Transactions on Information and System Security (TISSEC), 3(1):30-50, 2000. Google Scholar
  31. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms. arXiv preprint, 2017. URL: http://arxiv.org/abs/1707.06347.
  32. David C Snowdon, Etienne Le Sueur, Stefan M Petters, and Gernot Heiser. Koala: A Platform for OS-Level Power Management. In Proceedings of the 4th ACM European conference on Computer systems, pages 289-302, 2009. Google Scholar
  33. Jan Spieck, Stefan Wildermann, and Jürgen Teich. Scenario-Based Soft Real-Time Hybrid Application Mapping for MPSoCs. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1-6. IEEE, 2020. Google Scholar
  34. Jan Spieck, Stefan Wildermann, and Jürgen Teich. Domain-Adaptive Soft Real-Time Hybrid Application Mapping for MPSoCs. In 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), pages 1-6. IEEE, 2021. Google Scholar
  35. Jan Spieck, Stefan Wildermann, and Jürgen Teich. A Learning-Based Methodology for Scenario-Aware Mapping of Soft Real-Time Applications onto Heterogeneous MPSoCs. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2022. Google Scholar
  36. Jan Spieck, Stefan Wildermann, and Jürgen Teich. On Transferring Application Mapping Knowledge Between Differing MPSoC Architectures. In CODES+ISSS 2022, 2022. Google Scholar
  37. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning - An Introduction. Adaptive computation and machine learning. MIT Press, 1998. URL: https://www.worldcat.org/oclc/37293240.
  38. Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and Compilation of Parallel Programs for-Predictable MPSoC Execution Using Invasive Computing. In 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC), pages 313-320. IEEE, 2016. Google Scholar
  39. Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive Computing: An Overview. Multiprocessor System-on-Chip, pages 241-268, 2011. Google Scholar
  40. Jürgen Teich, Pouya Mahmoody, Behnaz Pourmohseni, Sascha Roloff, Wolfgang Schröder-Preikschat, and Stefan Wildermann. Run-Time Enforcement of Non-functional Program Properties on MPSoCs. In A Journey of Embedded and Cyber-Physical Systems, pages 125-149. Springer, 2021. Google Scholar
  41. Jürgen Teich, Behnaz Pourmohseni, Oliver Keszöcze, Jan Spieck, and Stefan Wildermann. Run-Time Enforcement of Non-Functional Application Requirements in Heterogeneous Many-Core Systems. In 25th Asia and South Pacific Design Automation Conference, ASP-DAC 2020, Beijing, China, January 13-16, 2020, pages 629-636. IEEE, 2020. Google Scholar
  42. Xiaohang Wang, Amit Kumar Singh, Bing Li, Yang Yang, Hong Li, and Terrence Mak. Bubble Budgeting: Throughput Optimization for Dynamic Workloads by Exploiting Dark Cores in Many Core Systems. IEEE Transactions on Computers, 67(2):178-192, 2017. Google Scholar
  43. Christopher John Cornish Hellaby Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, May 1989. Google Scholar
  44. Amir Yeganeh-Khaksar, Mohsen Ansari, Sepideh Safari, Sina Yari-Karin, and Alireza Ejlali. Ring-DVFS: Reliability-Aware Reinforcement Learning-Based DVFS for Real-Time Embedded Systems. IEEE Embedded Systems Letters, 13(3):146-149, 2021. Google Scholar