A Tight Holistic Memory Latency Bound Through Coordinated Management of Memory Resources

Abdelhalim, Shorouk; Germchi, Danesh; Hossam, Mohamed; Pellizzoni, Rodolfo; Hassan, Mohamed

doi:10.4230/LIPIcs.ECRTS.2023.17

Abstract

To facilitate the safe adoption of multi-core platforms in real-time systems, a plethora of recent research efforts aim at bounding the delays induced by interference upon accessing the shared memory resources in these platforms. These efforts, despite their value, are scattered, with each one focusing solely on only one of these resources with the premise that latency bounds separately driven for each resource can be added all together to provide a safe end-to-end memory bound. In this work, we put this assumption to the test for the first time by 1) considering a realistic multi-core memory hierarchy system, 2) deriving the bounds for accessing the shared resources in this system, and 3) highlighting the limitations of this widely-adopted approach. In particular, we show that this approach leads to not only excessively pessimistic but also unsafe bounds. Motivated by these findings, we propose GRROF: a novel approach to predictably and efficiently schedule memory requests while traversing the entire memory hierarchy through coordination among arbiters managing all the resources in this hierarchy. By virtue of this novel mechanism, we managed to exploit pipelining upon analyzing the latency of the memory requests for tightly bounding the worst-case latency. We prove in the paper that GRROF enables us to derive a drastically tighter bound compared to the common additive latency approach with more than 18× reduction in the end-to-end memory latency bound for a modern Out-of-Order quad-core platform. The reduction is further improved significantly with the increase in the number of cores. The proposed solution is fully prototyped and tested in a cycle-accurate simulation. We also compare it with real-time competitive state-of-the-art and performance-oriented solutions existing in modern Commercial-off-the-Shelf (COTS) platforms.

Benny Akesson, Kees Goossens, and Markus Ringhofer. Predator: A predictable sdram memory controller. In 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 251-256, 2007. URL: https://doi.org/10.1145/1289816.1289877.
Intel® iris® plus graphics and uhd graphics open source. programmer’s reference manualintel® 64 and ia-32 architectures optimization reference manual. URL: https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-icllp-vol07-memory_cache_0.pdf.
ARM. Arm926ej-s™ revision: r0p5 technical reference manual. https://developer.arm.com/documentation/ddi0198/e, 2008.
ARM. Cortex-m4 technical reference manual r0p0. https://developer.arm.com/documentation/ddi0439/b, 2010.
Michael G Bechtel and Heechul Yun. Denial-of-service attacks on shared cache in multicore: Analysis and prevention. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 357-367, 2019. URL: https://doi.org/10.1109/RTAS.2019.00037.
Leonardo Ecco, Sebastian Tobuschat, Selma Saidi, and Rolf Ernst. A mixed critical memory controller using bank privatization and fixed priority scheduling. In 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 1-10, 2014. URL: https://doi.org/10.1109/RTCSA.2014.6910550.
Georgia Giannopoulou, Nikolay Stoimenov, Pengcheng Huang, and Lothar Thiele. Scheduling of mixed-criticality applications on resource-sharing multicore systems. In 2013 Proceedings of the International Conference on Embedded Software (EMSOFT), pages 1-15, 2013. URL: https://doi.org/10.1109/EMSOFT.2013.6658595.
G. Gracioli, A. Alhammad, R. Mancuso, A. A. Fröhlich, and R. Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv., 48(2), 2015. URL: https://doi.org/10.1145/2830555.
Danlu Guo, Mohamed Hassan, Rodolfo Pellizzoni, and Hiren Patel. A comparative study of predictable DRAM controllers. ACM Transactions on Embedded Computing Systems (TECS), 2018. URL: https://doi.org/10.1145/3158208.
Sebastian Hahn, Michael Jacobs, and Jan Reineke. Enabling compositionality for multicore timing analysis. In Proceedings of the 24th International Conference on Real-Time Networks and Systems (RTNS), pages 299-308, 2016. URL: https://doi.org/10.1145/2997465.2997471.
Damien Hardy, Thomas Piquet, and Isabelle Puaut. Using bypass to tighten wcet estimates for multi-core processors with shared instruction caches. In 2009 30th IEEE Real-Time Systems Symposium (RTSS), pages 68-77, 2009. URL: https://doi.org/10.1109/RTSS.2009.34.
Mohamed Hassan. Heterogeneous mpsocs for mixed-criticality systems: Challenges and opportunities. IEEE Design & Test, pages 47-55, 2017. URL: https://doi.org/10.1109/MDAT.2017.2771447.
Mohamed Hassan. Disco: Time-compositional cache coherence for multi-core real-time embedded systems. IEEE Transactions on Computers (TC), pages 1163-1177, 2022. URL: https://doi.org/10.1109/TC.2022.3193624.
Mohamed Hassan and Hiren Patel. Criticality- and requirement-aware bus arbitration for multi-core mixed criticality systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-11, 2016. URL: https://doi.org/10.1109/RTAS.2016.7461327.
Mohamed Hassan and Rodolfo Pellizzoni. Bounding dram interference in cots heterogeneous mpsocs for mixed criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), pages 2323-2336, 2018. URL: https://doi.org/10.1109/TCAD.2018.2857379.
Mohamed Hassan and Rodolfo Pellizzoni. Analysis of memory contention in heterogeneous cots mpsocs. In Euromicro Conference on Real-Time Systems (ECRTS), pages 23:1-23:24, 2020. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2020.23.
Salah Hessien and Mohamed Hassan. The best of all worlds: Improving predictability at the performance of conventional coherence with no protocol modifications. In 2020 IEEE Real-Time Systems Symposium (RTSS), pages 218-230, 2020. URL: https://doi.org/10.1109/RTSS49844.2020.00029.
Mohamed Hossam and Mohamed Hassan. Predictably and efficiently integrating cots cache coherence in real-time systems. In Euromicro Conference on Real-Time Systems (ECRTS), pages 17:1-17:23, 2022. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2022.17.
Intel. Write combining memory implementation guidelines. https://download.intel.com/design/PentiumII/applnots/24442201.pdf, 1998.
Javier Jalle, Eduardo Quinones, Jaume Abella, Luca Fossati, Marco Zulianello, and Francisco J Cazorla. A dual-criticality memory controller (dcmc): Proposal and evaluation of a space case study. In 2014 IEEE Real-Time Systems Symposium (RTSS), pages 207-217, 2014. URL: https://doi.org/10.1109/RTSS.2014.23.
Praveen Jayachandran and Tarek Abdelzaher. Delay composition in preemptive and non-preemptive real-time pipelines. Real-Time Systems, pages 290-320, 2008. URL: https://doi.org/10.1007/s11241-008-9056-3.
Praveen Jayachandran and Tarek F. Abdelzaher. End-to-end delay analysis of distributed systems with cycles in the task graph. In In Proceedings of the 21st Euromicro Conference on Real-Time Systems (ECRTS), pages 13-22, 2009. URL: https://doi.org/10.1109/ECRTS.2009.15.
Anirudh M. Kaushik, Mohamed Hassan, and Hiren Patel. Designing Predictable Cache Coherence Protocols for Multi-Core Real-Time Systems. IEEE Transactons on Computers (TC), pages 1-23, 2020. URL: https://doi.org/10.1109/TC.2020.3037747.
Anirudh Mohan Kaushik, Paulos Tegegn, Zhuanhao Wu, and Hiren Patel. Carp: A data communication mechanism for multi-core mixed-criticality systems. In 2019 IEEE Real-Time Systems Symposium (RTSS), pages 419-432, 2019. URL: https://doi.org/10.1109/RTSS46320.2019.00044.
Ondrej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan M. Petters, and Henrik Theiling. Multicore in real-time systems – temporal isolation challenges due to shared resources. In Design, Automation and Test in Europe (DATE), 2013.
David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA), pages 81-87, 1981.
NG Chetan Kumar, Sudhanshu Vyas, Ron K Cytron, Christopher D Gill, Joseph Zambreno, and Phillip H Jones. Cache design for mixed criticality real-time systems. In 2014 IEEE 32nd International Conference on Computer Design (ICCD), pages 513-516, 2014. URL: https://doi.org/10.1109/ICCD.2014.6974730.
Renato Mancuso, Roman Dudko, Emiliano Betti, Marco Cesati, Marco Caccamo, and Rodolfo Pellizzoni. Real-time cache management framework for multi-core architectures. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 45-54, 2013. URL: https://doi.org/10.1109/RTAS.2013.6531078.
Reza Mirosanlou, Danlu Guo, Mohamed Hassan, and Rodolfo Pellizzoni. Mcsim: An extensible dram memory controller simulator. IEEE Computer Architecture Letters (LCA), pages 105-109, 2020. URL: https://doi.org/10.1109/LCA.2020.3008288.
Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni. Drambulism: Balancing performance and predictability through dynamic pipelining. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 82-94, 2020. URL: https://doi.org/10.1109/RTAS48715.2020.00-15.
Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni. Duetto: Latency guarantees at minimal performance cost. In 2021 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1136-1141, 2021. URL: https://doi.org/10.23919/DATE51398.2021.9474062.
Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni. DuoMC: Tight DRAM Latency Bounds with Shared Banks and Near-COTS Performance. In ACM International Symposium on Memory Systems (MEMSYS), pages 1-14, 2021. URL: https://doi.org/10.1145/3488423.3519322.
Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni. Parallelism-Aware High-Performance Cache Coherence with Tight Latency Bounds. In 34th Euromicro Conference on Real-Time Systems (ECRTS), pages 16:1-16:27, 2022. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2022.16.
Onur Mutlu and Thomas Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 146-160, 2007. URL: https://doi.org/10.1109/MICRO.2007.21.
Onur Mutlu and Thomas Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ACM SIGARCH Computer Architecture News (ISCA), pages 63-74, 2008. URL: https://doi.org/10.1109/ISCA.2008.7.
Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. A predictable execution model for cots-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 269-279, 2011. URL: https://doi.org/10.1109/RTAS.2011.33.
Martin Schoeberl, Wolfgang Puffitsch, and Benedikt Huber. Towards time-predictable data caches for chip-multiprocessors. In IFIP International Workshop on Software Technolgies for Embedded and Ubiquitous Systems (SEUS), pages 180-191, 2009. URL: https://doi.org/10.1007/978-3-642-10265-3_17.
DDR4 SDRAM Standard, JEDEC JESD79-4, 2012.
John Paul Shen and Mikko H Lipasti. Modern processor design: fundamentals of superscalar processors. Waveland Press, 2013.
Sarabjeet Singh and Manu Awasthi. Memory centric characterization and analysis of spec cpu2017 suite. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (ICPE), pages 285-292, 2019. URL: https://doi.org/10.1145/3297663.3310311.
Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. Taming non-blocking caches to improve isolation in multicore real-time systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-12, 2016. URL: https://doi.org/10.1109/RTAS.2016.7461361.
Man-Ki Yoon, Jung-Eun Kim, and Lui Sha. Optimizing tunable wcet with shared resource allocation and arbitration in hard real-time multicore systems. In 2011 IEEE 32nd Real-Time Systems Symposium (RTSS), pages 227-238, 2011. URL: https://doi.org/10.1109/RTSS.2011.28.
Heechul Yun, Rodolfo Pellizzoni, and Prathap Kumar Valsan. Parallelism-aware memory interference delay analysis for cots multicore systems. In 2015 27th Euromicro Conference on Real-Time Systems (ECRTS), pages 184-195, 2015. URL: https://doi.org/10.1109/ECRTS.2015.24.

A Tight Holistic Memory Latency Bound Through Coordinated Management of Memory Resources

Authors Shorouk Abdelhalim, Danesh Germchi, Mohamed Hossam, Rodolfo Pellizzoni, Mohamed Hassan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

A Tight Holistic Memory Latency Bound Through Coordinated Management of Memory Resources

Authors Shorouk Abdelhalim, Danesh Germchi, Mohamed Hossam, Rodolfo Pellizzoni, Mohamed Hassan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

Supplementary Materials

References

Thanks for your feedback!

Could not send message