A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic

Authors Denis Hoornaert, Shahin Roozkhosh, Renato Mancuso



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2021.2.pdf
  • Filesize: 0.83 MB
  • 22 pages

Document Identifiers

Author Details

Denis Hoornaert
  • TU München, Germany
Shahin Roozkhosh
  • Boston University, MA, USA
Renato Mancuso
  • Boston University, MA, USA

Cite AsGet BibTex

Denis Hoornaert, Shahin Roozkhosh, and Renato Mancuso. A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 196, pp. 2:1-2:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ECRTS.2021.2

Abstract

The sharp increase in demand for performance has prompted an explosion in the complexity of modern multi-core embedded systems. This has lead to unprecedented temporal unpredictability concerns in Cyber-Physical Systems (CPS). On-chip integration of programmable logic (PL) alongside a conventional Processing System (PS) in modern Systems-on-Chip (SoC) establishes a genuine compromise between specialization, performance, and reconfigurability. In addition to typical use-cases, it has been shown that the PL can be used to observe, manipulate, and ultimately manage memory traffic generated by a traditional multi-core processor. This paper explores the possibility of PL-aided memory scheduling by proposing a Scheduler In-the-Middle (SchIM). We demonstrate that the SchIM enables transaction-level control over the main memory traffic generated by a set of embedded cores. Focusing on extensibility and reconfigurability, we put forward a SchIM design covering two main objectives. First, to provide a safe playground to test innovative memory scheduling mechanisms; and second, to establish a transition path from software-based memory regulation to provably correct hardware-enforced memory scheduling. We evaluate our design through a full-system implementation on a commercial PS-PL platform using synthetic and real-world benchmarks.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time system architecture
Keywords
  • Memory Scheduling
  • PLIM
  • FPGA
  • Memory Management
  • Bandwidth Regulation
  • MemGuard
  • Coloring
  • Bank Partitioning
  • Real-time
  • Multicore
  • Safety-critical

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. B. Akesson. Predictable and composable system-on-chip memory controllers. PhD thesis, Technische Universiteit Eindhoven, School of Electrical Engineering, 2010. URL: https://doi.org/10.6100/IR658012.
  2. B. Akesson, K. Goossens, and M. Ringhofer. Predator: a predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 251-256, 2007. Google Scholar
  3. G. Alonso, T. Roscoe, D. Cock, M. Ewaida, Kaan Kara, Dario Korolija, D. Sidler, and Ze ke Wang. Tackling hardware/software co-design from a database perspective. In Conference on Innovative Data Systems Research (CIDR), Amsterdam, Netherlands, January 2020. Google Scholar
  4. ARM. ARM® CoreLink™ QoS-400 Network Interconnect Advanced Quality of Service, 2013. Accessed on 09.01.2020. Google Scholar
  5. ARM. AMBA AXI and ACE Protocol Specification. Technical report, ARM, 2019. URL: https://static.docs.arm.com/ihi0022/g/IHI0022G_amba_axi_protocol_spec.pdf.
  6. A. Biondi, A. Balsini, M. Pagani, E. Rossi, M. Marinoni, and G. Buttazzo. A framework for supporting real-time applications on dynamic reconfigurable FPGAs. In 2016 IEEE Real-Time Systems Symposium (RTSS), pages 1-12, 2016. URL: https://doi.org/10.1109/RTSS.2016.010.
  7. J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H. Anderson. LITMUS^RT : A testbed for empirically comparing real-time multiprocessor schedulers. In 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06), pages 111-126, 2006. URL: https://doi.org/10.1109/RTSS.2006.27.
  8. F. Farshchi, Qijing Huang, and H. Yun. BRU: Bandwidth regulation unit for real-time multicore processors. 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 364-375, 2020. Google Scholar
  9. F. Farshchi, P. Kumar, R. Mancuso, and H. Yun. Deterministic Memory Abstraction and Supporting Multicore System Architecture. In Sebastian Altmeyer, editor, 30th Euromicro Conference on Real-Time Systems (ECRTS 2018), volume 106 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1-1:25, Barcelona, Spain, July 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2018.1.
  10. C. Ferri, A. Marongiu, B. Lipton, R. Bahar, T. Moreshet, L. Benini, and M. Herlihy. SoC-TM: integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 39-48, 2011. Google Scholar
  11. G. Gracioli, R. Tabish, R. Mancuso, R. Mirosanlou, R. Pellizzoni, and M. Caccamo. Designing mixed criticality applications on modern heterogeneous MPSoC platforms. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  12. Intel, Corp. Intel’s Stratix 10 FPGA: Supporting the smart and connected revolution, October 2016. Accessed on 09.01.2020. URL: https://newsroom.intel.com/editorials/intels-stratix-10-fpga-supporting-smart-connected-revolution/.
  13. A. K. Jain, S. Lloyd, and M. Gokhale. Microscope on memory: MPSoC-enabled computer memory system assessments. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 173-180, 2018. URL: https://doi.org/10.1109/FCCM.2018.00035.
  14. H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in COTS-based multi-core systems. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 145-154, 2014. URL: https://doi.org/10.1109/RTAS.2014.6925998.
  15. H. Kim and R. Rajkumar. Real-time cache management for multi-core virtualization. In 2016 International Conference on Embedded Software (EMSOFT), pages 1-10, 2016. Google Scholar
  16. J. Kiszka, V. Sinitsin, H. Schild, and contributors. Jailhouse Hypervisor. Accessed on 09.01.2020. URL: https://github.com/siemens/jailhouse.
  17. C. Maiza, H. Rihani, J. Rivas, J. Goossens, S. Altmeyer, and R. Davis. A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems. ACM Comput. Surv., 52(3), 2019. URL: https://doi.org/10.1145/3323212.
  18. Microsemi - Microchip Technology Inc. PolarFire SoC - Lowest Power, Multi-Core RISC-V SoC FPGA, July 2020. Accessed on 09.01.2020. URL: https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga.
  19. S. Min, S. Huan, M. El-Hadedy, J. Xiong, D. Chen, and W. Hwu. Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pages 301-306, 2019. URL: https://doi.org/10.1109/FPL.2019.00055.
  20. R. Mirosanlou, M. Hassan, and R. Pellizzoni. DRAMbulism: balancing performance and predictability through dynamic pipelining. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 82-94, 2020. URL: https://doi.org/10.1109/RTAS48715.2020.00-15.
  21. P. Modica, A. Biondi, G. Buttazzo, and A. Patel. Supporting temporal and spatial isolation in a hypervisor for ARM multicore platforms. In 2018 IEEE International Conference on Industrial Technology (ICIT), pages 1651-1657, 2018. Google Scholar
  22. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pages 146-160. IEEE, 2007. Google Scholar
  23. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In 2008 International Symposium on Computer Architecture, pages 63-74. IEEE, 2008. Google Scholar
  24. K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pages 208-222. IEEE, 2006. Google Scholar
  25. M. Paolieri, E. Quinones, F. Cazorla, and M. Valero. An analyzable memory controller for hard real-time CMPs. IEEE Embedded Systems Letters, 1(4):86-90, 2009. Google Scholar
  26. N. Rafique, W. Lim, and M. Thottethodi. Effective management of DRAM bandwidth in multicore processors. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pages 245-258. IEEE, 2007. Google Scholar
  27. F. Restuccia, A. Biondi, M. Marinoni, G. Cicero, and G. Buttazzo. AXI HyperConnect: A predictable, hypervisor-level interconnect for hardware accelerators in FPGA SoC. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1-6, 2020. URL: https://doi.org/10.1109/DAC18072.2020.9218652.
  28. F. Restuccia, M. Pagani, A. Biondi, M. Marinoni, and G. Buttazzo. Is your bus arbiter really fair? restoring fairness in AXI interconnects for FPGA SoCs. ACM Trans. Embed. Comput. Syst., 18(5s), 2019. URL: https://doi.org/10.1145/3358183.
  29. S. Roozkhosh and R. Mancuso. The potential of programmable logic in the middle: Cache bleaching. In 26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2020), Sydney, Australia, April 2020. Google Scholar
  30. P. Sohal, R. Tabish, U. Drepper, and R. Mancuso. E-WarP: a system-wide framework for memory bandwidth profiling and management. In 41st IEEE Real-Time Systems Symposium (RTSS 2020), Houston, TX, USA, December 2020. Google Scholar
  31. ST Microelectronics Inc. Real-time performance using FIQ interrupt handling in SPEAr MPUs, January 2010. Accessed on 10.01.2020. Google Scholar
  32. M. Solieri T. Kloda, R. Mancuso, N. Capodieci, P. Valente, and M. Bertogna. Deterministic Memory Hierarchy and Virtualization for Modern Multi-Core Embedded Systems. In 25th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2019), pages 1-14, Montreal, Canada, April 2019. URL: https://doi.org/10.1109/RTAS.2019.00009.
  33. H. Usui, L. Subramanian, K. Chang, and O. Mutlu. Dash: Deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Transactions on Architecture and Code Optimization (TACO), 12(4):1-28, 2016. Google Scholar
  34. P. Valsan and H. Yun. MEDUSA: A predictable and high-performance DRAM controller for multicore based embedded systems. In 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications, pages 86-93. IEEE, 2015. Google Scholar
  35. S. K. Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. B. Taylor. SD-VBS: The san diego vision benchmark suite. In 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 55-64, 2009. Google Scholar
  36. Xilinx. Integrated Logic Analyzer v6.2 LogiCORE IP Product Guide. Technical report, Xilinx, 2016. URL: https://www.xilinx.com/support/documentation/ip_documentation/ila/v6_2/pg172-ila.pdf.
  37. Xilinx. Zynq UltraScale+ Device Technical Reference Manual. Technical report, Xilinx, 2019. URL: https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf.
  38. M. Xu, L. T. X. Phan, H. Choi, Y. Lin, H. Li, C. Lu, and I. Lee. Holistic resource allocation for multicore real-time systems. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 345-356, 2019. URL: https://doi.org/10.1109/RTAS.2019.00036.
  39. H. Yun, W. Ali, S. Gondi, and S. Biswas. BWLOCK: A Dynamic Memory Access Control Framework for Soft Real-Time Applications on Multicore Platforms. IEEE Transactions on Computers, 66(7):1247-1252, 2017. Google Scholar
  40. H. Yun, R. Mancuso, Z. P. Wu, and R. Pellizzoni. Palloc: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 155-166, 2014. URL: https://doi.org/10.1109/RTAS.2014.6925999.
  41. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64, 2013. Google Scholar
  42. Y. Zhou and D. Wentzlaff. MITTS: Memory inter-arrival time traffic shaping. ACM SIGARCH Computer Architecture News, 44(3):532-544, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail