Governing with Insights: Towards Profile-Driven Cache Management of Black-Box Applications

Authors Golsana Ghaemi, Dharmesh Tarapore, Renato Mancuso

Thumbnail PDF


  • Filesize: 4.99 MB
  • 25 pages

Document Identifiers

Author Details

Golsana Ghaemi
  • Boston University, MA, USA
Dharmesh Tarapore
  • Boston University, MA, USA
Renato Mancuso
  • Boston University, MA, USA

Cite AsGet BibTex

Golsana Ghaemi, Dharmesh Tarapore, and Renato Mancuso. Governing with Insights: Towards Profile-Driven Cache Management of Black-Box Applications. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 196, pp. 4:1-4:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


There exists a divide between the ever-increasing demand for high-performance embedded systems and the availability of practical methodologies to understand the interplay of complex data-intensive applications with hardware memory resources. On the one hand, traditional static analysis approaches are seldomly applicable to latest-generation multi-core platforms due to a lack of accurate micro-architectural models. On the other hand, measurement-based methods only provide coarse-grained information about the end-to-end execution of a given real-time application. In this paper, we describe a novel methodology, namely Black-Box Profiling (BBProf), to gather fine-grained insights on the usage of cache resources in applications of realistic complexity. The goal of our technique is to extract the relative importance of individual memory pages towards the overall temporal behavior of a target application. Importantly, BBProf does not require the semantics of the target application to be known - i.e., applications are treated as black-boxes - and it does not rely on any platform-specific hardware support. We provide an open-source full-system implementation and showcase how BBProf can be used to perform profile-driven cache management.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time system architecture
  • Cache Profiling
  • WSS Estimation
  • Cache Interference
  • Real-time
  • Multicore
  • Contention-induced Instruction Stall
  • C2IS
  • Coloring
  • Cache Management
  • Cacheability


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Siemens AG. Jailhouse, 2014. URL:
  2. ARM Holdings. Cortex-A53 MPCore technical reference manual (r0p4), 2018. URL:
  3. I. Ashraf, M. Taouil, and K. Bertels. Memory profiling for intra-application data-communication quantification: A survey. In 2015 10th International Design Test Symposium (IDT), pages 32-37, 2015. URL:
  4. F. Bouquillon, C. Ballabriga, G. Lipari, and S. Niar. A wcet-aware cache coloring technique for reducing interference in real-time systems. CoRR, abs/1903.09310, 2019. URL:
  5. D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO '03, page 265–275, USA, 2003. IEEE Computer Society. Google Scholar
  6. J. M. Calandrino and J. H. Anderson. On the design and implementation of a cache-aware multicore real-time scheduler. In 2009 21st Euromicro Conference on Real-Time Systems, pages 194-204, 2009. URL:
  7. W. Cohen. Multiple Architecture Characterization of the Build Process with OProfile, 2003. URL:
  8. J. Corbet, J. Edge, and R. Sobol. Kernel Development. Linux Weekly News -, 2004. [Online; accessed 7-May-2019].
  9. C. Dall and J. Nieh. Kvm/arm: The design and implementation of the linux arm hypervisor. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 333–348, New York, NY, USA, 2014. Association for Computing Machinery. URL:
  10. The Linux Foundation. perf: Linux profiling with performance counters. URL:
  11. R. Mancuso G. Ghaemi, D. Tarapore. BU Black-box Profiler., 2021.
  12. G. Gracioli, A. Alhammad, R. Mancuso, A. A. Fröhlich, and R. Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv., 48(2), 2015. URL:
  13. G. Gracioli, R. Tabish, R. Mancuso, R. Mirosanlou, R. Pellizzoni, and M. Caccamo. Designing Mixed Criticality Applications on Modern Heterogeneous MPSoC Platforms. In Sophie Quinton, editor, 31th Euromicro Conference on Real-Time Systems (ECRTS 2019), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1-27:25, Stuttgart, Germany, July 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL:
  14. M. Hassan. On the off-chip memory latency of real-time systems: Is ddr dram really the best option? In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 495-505, 2018. URL:
  15. ARM Holdings. ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile (version G.a), 2011. Google Scholar
  16. H. Kim, A. Kandhalu, and R. Rajkumar. A coordinated approach for practical os-level cache management in multi-core real-time systems. In 2013 25th Euromicro Conference on Real-Time Systems, pages 80-89, 2013. URL:
  17. H. Kim and R. Rajkumar. Real-time cache management for multi-core virtualization. In 2016 International Conference on Embedded Software (EMSOFT), pages 1-10, 2016. URL:
  18. H. Kim and R. (Raj) Rajkumar. Predictable shared cache management for multi-core real-time virtualization. ACM Trans. Embed. Comput. Syst., 17(1), 2017. URL:
  19. N. Kim, B. C. Ward, M. Chisholm, C. Fu, J. H. Anderson, and F. D. Smith. Attacking the one-out-of-m multicore problem by combining hardware management with mixed-criticality provisioning. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-12, 2016. URL:
  20. T. Kloda, M. Solieri, R. Mancuso, N. Capodieci, P. Valente, and M. Bertogna. Deterministic memory hierarchy and virtualization for modern multi-core embedded systems. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-14, 2019. URL:
  21. Y. Kwon, X. Zhang, and D. Xu. Pietrace: Platform independent executable trace. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 48-58, 2013. URL:
  22. J. Liedtke, H. Haertig, and M. Hohmuth. Os-controlled cache predictability for real-time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97), RTAS '97, page 213, USA, 1997. IEEE Computer Society. Google Scholar
  23. C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not., 40(6):190–200, June 2005. URL:
  24. R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. Real-time cache management framework for multi-core architectures. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 45-54, 2013. URL:
  25. S. Mittal. A survey of techniques for cache partitioning in multicore processors. ACM Comput. Surv., 50(2), 2017. URL:
  26. P. Modica, A. Biondi, G. Buttazzo, and A. Patel. Supporting temporal and spatial isolation in a hypervisor for arm multicore platforms. In 2018 IEEE International Conference on Industrial Technology (ICIT), pages 1651-1657, 2018. URL:
  27. N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89–100, June 2007. URL:
  28. A. Patel, M. Daftedar, M. Shalan, and M. W. El-Kharashi. Embedded hypervisor xvisor: A comparative analysis. In 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 682-691, 2015. URL:
  29. A. Pesterev, N. Zeldovich, and R. T. Morris. Locating cache performance bottlenecks using data profiling. In Proceedings of the 5th European Conference on Computer Systems, EuroSys '10, page 335–348, New York, NY, USA, 2010. Association for Computing Machinery. URL:
  30. P. Radojković, S. Girbal, A. Grasset, E. Quiñones, S. Yehia, and F.J. Cazorla. On the evaluation of the impact of shared resources in multithreaded cots processors in time-critical environments. ACM Trans. Archit. Code Optim., 8(4), 2012. URL:
  31. RotateRight. Zoom Performance Analysis Tool. URL:
  32. L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 258-269, 2008. URL:
  33. P. Sohal, R. Tabish, U. Drepper, and R. Mancuso. E-warp: A system-wide framework for memory bandwidth profiling and management. In 2020 IEEE Real-Time Systems Symposium (RTSS), pages 345-357, Los Alamitos, CA, USA, December 2020. IEEE Computer Society. URL:
  34. D. Tarapore, S. Roozkhosh, S. Brzozowski, and R. Mancuso. Observing the invisible: Live cache inspection for high-performance embedded systems. IEEE Transactions on Computers, pages 1-1, 2021. URL:
  35. S. K. Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. B. Taylor. SD-VBS: The san diego vision benchmark suite. In 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 55-64, October 2009. URL:
  36. Xilinx, Inc. Zynq ultrascale+ mpsoc data sheet: Overview (v1.8), 2019. URL:
  37. M. Xu, R. Gifford, and L.T. Xuan Phan. Holistic multi-resource allocation for multicore real-time virtualization. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC '19, New York, NY, USA, 2019. Association for Computing Machinery. URL:
  38. Y. Ye, R. West, Z. Cheng, and Y. Li. Coloris: A dynamic cache partitioning system using page coloring. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 381-392, 2014. URL:
  39. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64, 2013. URL:
  40. X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys '09, page 89–102, New York, NY, USA, 2009. Association for Computing Machinery. URL: