Discriminative Coherence: Balancing Performance and Latency Bounds in Data-Sharing Multi-Core Real-Time Systems

Author Mohamed Hassan



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2020.16.pdf
  • Filesize: 7.6 MB
  • 24 pages

Document Identifiers

Author Details

Mohamed Hassan
  • McMaster University, Hamilton, Canada

Cite AsGet BibTex

Mohamed Hassan. Discriminative Coherence: Balancing Performance and Latency Bounds in Data-Sharing Multi-Core Real-Time Systems. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 165, pp. 16:1-16:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ECRTS.2020.16

Abstract

Tasks in modern multi-core real-time systems share data and communicate among each other. Nonetheless, the majority of published research in real-time systems either assumes that tasks do not share data or prohibits data sharing by design. Only recently, some works investigated solutions to address this limitation and enable data sharing; however, we find these works to suffer from severe limitations. In particular, approaches that bypass private caches to avoid coherence interference altogether suffer from significant average-case performance degradation. On the other hand, proposed predictable cache coherence protocols increase the worst-case memory latency (WCL) quadratically due to coherence interference. In this paper, by carefully analyzing the scenarios that lead to high coherence interference, we make the following observation. A protocol that distinguishes between non-modifying (read) and modifying (write) memory accesses is key towards reducing the effects of coherence interference on WCL. Accordingly, we propose DISCO, a discriminative coherence solution that capitalizes on this observation to balance average-case performance and WCL. This is achieved by disallowing modified data in private caches, and hence, the significant coherence delays resulting from them are avoided. In addition, DISCO achieves high average performance by allowing tasks to simultaneously read shared data in the private caches. Moreover, if the system supports the distinction between private and shared data, DISCO further improves average performance by allowing for the caching of private data in cores' private caches regardless of whether it is modified or not. Our evaluation shows that DISCO achieves 7.2× lower latency bounds compared to the state-of-the-art predictable coherence protocol. DISCO also achieves up to 11.4× (5.3× on average) better performance than private cache bypassing for the SPLASH-3 benchmarks.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time systems
  • Computer systems organization → Real-time system architecture
  • Computer systems organization → Multicore architectures
Keywords
  • Coherence
  • Shared Data
  • Caches
  • Multi-Core
  • Real-Time
  • Memory

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Benny Akesson, Kees Goossens, and Markus Ringhofer. Predator: a predictable SDRAM memory controller. In IEEE/ACM international conference on Hardware/software codesign and system synthesis (CODES+ ISSS), 2007. Google Scholar
  2. ARM. ARM arm1176jz-s technical reference manual, 2013. Google Scholar
  3. Ayoosh Bansal, Jayati Singh, Yifan Hao, Jen-Yang Wen, Renato Mancuso, and Marco Caccamo. Cache where you want! reconciling predictability and coherent caching. arXiv preprint arXiv:1909.05349, 2019. Google Scholar
  4. Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nélis, and Thomas Nolte. Contention-free execution of automotive applications on a clustered many-core platform. In IEEE Euromicro Conference on Real-Time Systems (ECRTS), 2016. Google Scholar
  5. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 72-81. ACM, 2008. Google Scholar
  6. M. Chisholm, N. Kim, B. C. Ward, N. Otterness, J. H. Anderson, and F. D. Smith. Reconciling the tension between hardware isolation and data sharing in mixed-criticality, multicore systems. In IEEE Real-Time Systems Symposium (RTSS), 2016. Google Scholar
  7. B. Cilku, B. Frömel, and P. Puschner. A dual-layer bus arbiter for mixed-criticality systems with hypervisors. In 2014 12th IEEE International Conference on Industrial Informatics (INDIN), pages 147-151, July 2014. URL: https://doi.org/10.1109/INDIN.2014.6945499.
  8. Leonardo Ecco and Rolf Ernst. Improved dram timing bounds for real-time dram controllers with read/write bundling. In 2015 IEEE Real-Time Systems Symposium, pages 53-64. IEEE, 2015. Google Scholar
  9. Leonardo Ecco, Sebastian Tobuschat, Selma Saidi, and Rolf Ernst. A mixed critical memory controller using bank privatization and fixed priority scheduling. In IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTAS), 2014. Google Scholar
  10. Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv., 2015. Google Scholar
  11. Giovani Gracioli and Antônio Augusto Fröhlich. On the design and evaluation of a real-time operating system for cache-coherent multicore architectures. ACM SIGOPS Oper. Syst. Rev., 2015. Google Scholar
  12. Danlu Guo, Mohamed Hassan, Rodolfo Pellizzoni, and Hiren Patel. A comparative study of predictable dram controllers. ACM Transactions on Embedded Computing Systems (TECS), 2018. Google Scholar
  13. D. Hardy, T. Piquet, and I. Puaut. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In IEEE Real-Time Systems Symposium (RTSS), 2009. Google Scholar
  14. M. Hassan, A. M. Kaushik, and H. Patel. Predictable cache coherence for multi-core real-time systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2017. Google Scholar
  15. M. Hassan and H. Patel. Criticality- and requirement-aware bus arbitration for multi-core mixed criticality systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016. Google Scholar
  16. Mohamed Hassan. Heterogeneous mpsocs for mixed-criticality systems: Challenges and opportunities. IEEE Design & Test, 2018. Google Scholar
  17. Mohamed Hassan and Hiren Patel. A framework for scheduling DRAM accesses for multi-core mixed-time critical systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015. Google Scholar
  18. Mohamed Hassan, Hiren Patel, and Rodolfo Pellizzoni. PMC: A requirement-aware DRAM controller for multicore mixed criticality systems. ACM Trans. Embed. Comput. Syst., 2017. Google Scholar
  19. Farouk Hebbache, Mathieu Jan, Florian Brandner, and Laurent Pautet. Shedding the shackles of time-division multiplexing. In IEEE Real-Time Systems Symposium (RTSS), 2018. Google Scholar
  20. John L Hennessy and David A Patterson. Computer architecture: a quantitative approach. Elsevier, 2011. Google Scholar
  21. Intel. Intel 64 and IA-32 architectures software developer’s manual. Volume 3A: System Programming Guide, Part, 1(64), 64. Google Scholar
  22. Javier Jalle, Eduardo Quinones, Jaume Abella, Luca Fossati, Marco Zulianello, and Francisco J Cazorla. A dual-criticality memory controller (dcmc): Proposal and evaluation of a space case study. In IEEE Real-Time Systems Symposium (RTSS), 2014. Google Scholar
  23. Anirudh M. Kaushik, Paulos Tegegn, Zhuanhao Wu, and Hiren Patel. Carp: A data communication mechanism for multi-core mixed-criticality systems. In IEEE Real-Time Systems Symposium (RTSS), 2019. Google Scholar
  24. Sung-Kwan Kim, Sang Lyul Min, and Rhan Ha. Efficient worst case timing analysis of data caching. In Proceedings Real-Time Technology and Applications, pages 230-240. IEEE, 1996. Google Scholar
  25. NG Chetan Kumar, Sudhanshu Vyas, Ron K Cytron, Christopher D Gill, Joseph Zambreno, and Phillip H Jones. Cache design for mixed criticality real-time systems. In IEEE International Conference on Computer Design (ICCD), 2014. Google Scholar
  26. Benjamin Lesage, Damien Hardy, and Isabelle Puaut. Shared Data Caches Conflicts Reduction for WCET Computation in Multi-Core Architectures. In International Conference on Real-Time and Network Systems, 2010. Google Scholar
  27. Benjamin Lesage, Isabelle Puaut, and André Seznec. PRETI: Partitioned real-time shared cache for mixed-criticality real-time systems. In Proceedings of the 20th International Conference on Real-Time and Network Systems (RTNS), 2012. Google Scholar
  28. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, volume 40(6), pages 190-200. ACM, 2005. Google Scholar
  29. Renato Mancuso, Roman Dudko, Emiliano Betti, Marco Cesati, Marco Caccamo, and Rodolfo Pellizzoni. Real-time cache management framework for multi-core architectures. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 45-54. IEEE, 2013. Google Scholar
  30. MILO MK MARTIN, MARK D HILL, and DANIEL J SORIN. Why on-chip cache coherence is here to stay. Communications of ACM, 2012. Google Scholar
  31. Marco Paolieri, Eduardo Qui~nones, Francisco J. Cazorla, Guillem Bernat, and Mateo Valero. Hardware support for WCET analysis of hard real-time multicore systems. In ACM Annual International Symposium on Computer Architecture (ISCA), 2009. Google Scholar
  32. Marco Paolieri, Eduardo Qui~nones, Fransisco J. Cazorla, and Mateo Valero. An analyzable memory controller for hard real-time CMPs. Embedded System Letters (ESL), 1:86-90, 2009. Google Scholar
  33. Jason Poovey et al. Characterization of the EEMBC benchmark suite. North Carolina State University, 2007. Google Scholar
  34. Jan Reineke, Isaac Liu, Hiren D Patel, Sungjun Kim, and Edward A Lee. PRET DRAM controller: Bank privatization for predictability and temporal isolation. In IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ ISSS), 2011. Google Scholar
  35. Christos Sakalis, Carl Leonardsson, Stefanos Kaxiras, and Alberto Ros. Splash-3: A properly synchronized benchmark suite for contemporary research. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 101-111. IEEE, 2016. Google Scholar
  36. Martin Schoeberl, Wolfgang Puffitsch, and Benedikt Huber. Towards time-predictable data caches for chip-multiprocessors. In Springer International Workshop on Software Technolgies for Embedded and Ubiquitous Systems (IFIP), 2009. Google Scholar
  37. Lui Sha, Marco Caccamo, Renato Mancuso, Jung-Eun Kim, Man-Ki Yoon, Rodolfo Pellizzoni, Heechul Yun, Russel Kegley, Dennis Perlman, Greg Arundale, et al. Single core equivalent virtual machines for hard real—time computing on multicore processors, 2014. Google Scholar
  38. Daniel J Sorin, Mark D Hill, and David A Wood. A primer on memory consistency and cache coherence. Synthesis Lectures on Computer Architecture, 2011. Google Scholar
  39. N. Sritharan, A. M. Kaushik, M. Hassan, and H. Patel. Hourglass: Predictable time-based cache coherence protocol for dual-critical multi-core systems. CoRR, 2017. URL: https://arxiv.org/abs/1706.07568.
  40. Nivedita Sritharan, Anirudh Mohan Kaushik, Mohamed Hassan, and Hiren Patel. Enabling predictable, simultaneous and coherent data sharing in mixed criticality systems. In IEEE Real-Time Systems Symposium (RTSS), pages 433-445, 2019. Google Scholar
  41. Per Stenstrom. A survey of cache coherence schemes for multiprocessors. IEEE Computer, 1990. Google Scholar
  42. Vivy Suhendra and Tulika Mitra. Exploring locking & partitioning for predictable shared caches on multi-cores. In ACM Annual Design Automation Conference (DAC), 2008. Google Scholar
  43. B. C. Ward, J. L. Herman, C. J. Kenna, and J. H. Anderson. Making shared caches more predictable on multicore platforms. In IEEE Euromicro Conference on Real-Time Systems (ECRTS), 2013. Google Scholar
  44. Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(7):966-978, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail