MonTM: Monitoring-Based Thermal Management for Mixed-Criticality Systems

Authors Marcel Mettler , Martin Rapp , Heba Khdr , Daniel Mueller-Gritschneder , Jörg Henkel , Ulf Schlichtmann

Thumbnail PDF


  • Filesize: 1.15 MB
  • 12 pages

Document Identifiers

Author Details

Marcel Mettler
  • Chair of Electronic Design Automation, Technische Universität München, Germany
Martin Rapp
  • Chair for Embedded Systems, Karlsruhe Institute of Technology, Germany
Heba Khdr
  • Chair for Embedded Systems, Karlsruhe Institute of Technology, Germany
Daniel Mueller-Gritschneder
  • Chair of Electronic Design Automation, Technische Universität München, Germany
Jörg Henkel
  • Chair for Embedded Systems, Karlsruhe Institute of Technology, Germany
Ulf Schlichtmann
  • Chair of Electronic Design Automation, Technische Universität München, Germany

Cite AsGet BibTex

Marcel Mettler, Martin Rapp, Heba Khdr, Daniel Mueller-Gritschneder, Jörg Henkel, and Ulf Schlichtmann. MonTM: Monitoring-Based Thermal Management for Mixed-Criticality Systems. In 14th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 12th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2023). Open Access Series in Informatics (OASIcs), Volume 107, pp. 5:1-5:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


With a rapidly growing functionality of embedded real-time applications, it becomes inevitable to integrate tasks of different safety integrity levels on one many-core processor leading to a large-scale mixed-criticality system. In this process, it is not sufficient to only isolate shared architectural resources, as different tasks executing on different cores also possibly interfere via the many-core processor’s thermal management. This can possibly lead to best-effort tasks causing deadline violations for safety-critical tasks. In order to prevent such a scenario, we propose a monitoring-based hardware extension that communicates imminent thermal violations between cores via a lightweight interconnect. Building on this infrastructure, we propose a thermal strategy such that best-effort tasks can be throttled in favor of safety-critical tasks. Furthermore, assigning static voltage/frequency (V/f) levels to each safety-critical task based on their worst-case execution time may result in unnecessary high V/f levels when the actual execution finishes faster. To free the otherwise wasted thermal resources, our solution monitors the progress of safety-critical tasks to detect slack and safely reduce their V/f levels. This increases the thermal headroom for best-effort tasks, boosting their performance. In our evaluation, we demonstrate our approach on an 80-core processor to show that it satisfies the thermal and deadline requirements, and simultaneously reduces the run-time of best-effort tasks by up to 45% compared to the state of the art.

Subject Classification

ACM Subject Classification
  • Hardware → On-chip resource management
  • Computer systems organization → Embedded and cyber-physical systems
  • Dynamic thermal management
  • mixed-criticality
  • monitoring


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. IEC SC 65A. Functional safety of electrical/electronic/programmable electronic safety-related systems. Technical Report IEC 61508, The International Electrotechnical Commission, Geneva, Switzerland, 1998. Google Scholar
  2. J. Casazza. Intel turbo boost technology in intel core microarchitecture (nehalem) based processors. Technical report, Intel Corporation, November 2008. Google Scholar
  3. Hongxia Chai, Gongxuan Zhang, Jin Sun, Ahmadreza Vajdi, Jing Hua, and Junlong Zhou. A review of recent techniques in mixed-criticality systems. Journal of Circuits, Systems and Computers, 28(07):1930007, 2019. Google Scholar
  4. Kihwan Choi, R. Soma, and M. Pedram. Dynamic voltage and frequency scaling based on workload decomposition. In International Symposium on Low Power Electronics and Design (ISLPED), 2004. Google Scholar
  5. M. Cinque, D. Cotroneo, L. De Simone, and S. Rosiello. Virtualizing mixed-criticality systems: A survey on industrial trends and issues. Future Gener. Comput. Syst., 129(C):315-330, April 2022. Google Scholar
  7. R. Ernst and M. Di Natale. Mixed criticality systems - A history of misconceptions? IEEE Design & Test, 33(5):65-74, 2016. Google Scholar
  8. A. Hoban. Designing real-time solutions on embedded intel architecture processors. Technical report, Intel Corporation, May 2010. Google Scholar
  9. Intel. Intel xeon phi processor 7250, 2016. URL:
  10. H. Khdr, S. Pagani, É. Sousa, V. Lari, A. Pathania, F. Hannig, M. Shafique, J. Teich, and J. Henkel. Power density-aware resource management for heterogeneous tiled multicores. IEEE Transactions on Computers, 66(3):488-501, 2017. Google Scholar
  11. Martin Leucker and Christian Schallhart. A brief account of runtime verification. The Journal of Logic and Algebraic Programming, 78(5):293-303, 2009. The 1st Workshop on Formal Languages and Analysis of Contract-Oriented Software (FLACOS’07). Google Scholar
  12. M. Mettler, D. Mueller-Gritschneder, and U. Schlichtmann. A distributed hardware monitoring system for runtime verification on multi-tile mpsocs. ACM Trans. Archit. Code Optim., 18(1), December 2021. Google Scholar
  13. M. Mettler, M. Rapp, H. Khdr, D. Mueller-Gritschneder, J. Henkel, and U. Schlichtmann. An fpga-based approach to evaluate thermal and resource management strategies of many-core processors. ACM Trans. Archit. Code Optim., 19(3), May 2022. Google Scholar
  14. Sobhan Niknam, Anuj Pathania, and Andy D. Pimentel. T-tsp: Transient-temperature based safe power budgeting in multi-/many-core processors. In International Conference on Computer Design (ICCD), 2021. Google Scholar
  15. Vincent Nélis, Patrick Meumeu Yomsi, and Luís Miguel Pinho. Methodologies for the wcet analysis of parallel applications on many-core architectures. In 2015 Euromicro Conference on Digital System Design, pages 748-755, 2015. Google Scholar
  16. S. Pagani, J. Chen, M. Shafique, and J. Henkel. Matex: Efficient transient and peak temperature computation for compact thermal models. In Design, Automation Test in Europe Conference Exhibition (DATE), 2015. Google Scholar
  17. S. Pagani, H. Khdr, J.-J. Chen, M. Shafique, M. Li, and J. Henkel. Thermal Safe Power (TSP): Efficient Power Budgeting for Heterogeneous Manycore Systems in Dark Silicon. IEEE Transactions on Computers, 66(1):147-162, 2017. Google Scholar
  18. B. Ranjbar, A. Hosseinghorban, M. Salehi, A. Ejlali, and A. Kumar. Toward the design of fault-tolerance-aware and peak-power-aware multicore mixed-criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(5):1509-1522, 2022. Google Scholar
  19. S. Safari, H. Khdr, P. Gohari-Nazari, M. Ansari, S. Hessabi, and J. Henkel. Therma-mics: Thermal-aware scheduling for fault-tolerant mixed-criticality systems. IEEE Transactions on Parallel and Distributed Systems, 33(7):1678-1694, 2022. Google Scholar
  20. Amir Taherin, Mohammad Salehi, and Alireza Ejlali. Reliability-aware energy management in mixed-criticality systems. IEEE Transactions on Sustainable Computing, 3(3):195-208, 2018. Google Scholar
  21. H. Wang, D. Tang, M. Zhang, S. X.-D. Tan, C. Zhang, H. Tang, and Y. Yuan. Gdp: A greedy based dynamic power budgeting method for multi/many-core systems in dark silicon. IEEE Transactions on Computers, 68(4):526-541, 2019. Google Scholar
  22. Hai Wang, Wenjun He, Qinhui Yang, Xizhu Peng, and He Tang. Dbp: Distributed power budgeting for many-core systems in dark silicon. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022. Google Scholar