HWP: Hardware Support to Reconcile Cache Energy, Complexity, Performance and WCET Estimates in Multicore Real-Time Systems

Authors Pedro Benedicte , Carles Hernandez , Jaume Abella , Francisco J. Cazorla



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2018.3.pdf
  • Filesize: 1.56 MB
  • 22 pages

Document Identifiers

Author Details

Pedro Benedicte
  • Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Barcelona, Spain
Carles Hernandez
  • Barcelona Supercomputing Center, Barcelona, Spain
Jaume Abella
  • Barcelona Supercomputing Center, Barcelona, Spain
Francisco J. Cazorla
  • Barcelona Supercomputing Center and IIIA-CSIC, Barcelona, Spain

Cite AsGet BibTex

Pedro Benedicte, Carles Hernandez, Jaume Abella, and Francisco J. Cazorla. HWP: Hardware Support to Reconcile Cache Energy, Complexity, Performance and WCET Estimates in Multicore Real-Time Systems. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 3:1-3:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ECRTS.2018.3

Abstract

High-performance processors have deployed multilevel cache (MLC) systems for decades. In the embedded real-time market, the use of MLC is also on the rise, with processors for future systems in space, railway, avionics and automotive already featuring two or more cache levels. One of the most critical elements for MLC is the write policy that not only affects several key metrics such as performance, WCET estimates, energy/power, and reliability, but also the design of complexity-prone cache coherence protocol and cache reliability solutions. In this paper we make an extensive analysis of existing write policies, namely write-through (WT) and write-back (WB). In the context of the real-time domain, we show that no write policy is superior for all metrics: WT simplifies the design of the coherence and reliability solutions at the cost of performance, WCET, and energy; while WB improves performance and energy results, but complicates cache design. To take the best of each policy, we propose Hybrid Write Policy (HWP) a low-complexity hardware mechanism that reconciles the benefits of WT in terms of simplifying the cache design (e.g. coherence solution) and the benefits of WB in improved average performance and WCET estimates as the pressure on the interconnection network increases. Guaranteed performance results show that HWP scales with core count similar to WB. Likewise, HWP reduces cache energy usage of WT, to levels similar to those of WB. These benefits are obtained while retaining the reduced coherence complexity of WT, in contrast to high coherence costs under WB.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Parallel architectures
  • Computer systems organization → Embedded systems
  • Computer systems organization → Real-time systems
  • Computer systems organization → Dependable and fault-tolerant systems and networks
Keywords
  • multilevel caches
  • real-time systems
  • multicores
  • WCET

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. ARM. ARM Cortex-M7 processor. URL: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0489b/DDI0489B_cortex_m7_trm.pdf.
  2. ARM. Arm cortex-r series processors specification. URL: http://infocenter.arm.com/help/topic/com.arm.doc.set.cortexr/index.html.
  3. ARM. ARM Cortex R5 technical reference manual. URL: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0460d/DDI0460D_cortex_r5_r1p2_trm.pdf.
  4. ARM. ARM expects vehicle compute performance to increase 100x in next decade. https://www.arm.com/about/newsroom/arm-expects-vehicle-compute-performance-to-increase-100x-in-next-decade.php, 2015.
  5. T. Blaß, S. Hahn, and J. Reineke. Write-back caches in WCET analysis. In ECRTS, 2017. Google Scholar
  6. B. A. Cuesta, A. Ros, M. E. Gómez, A. Robles, and J. F. Duato. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In ISCA, 2011. Google Scholar
  7. D. Dasari, B. Andersson, V. Nelis, S. M. Petters, A. Easwaran, and J. Lee. Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In IEEE TrustCom, 2011. Google Scholar
  8. E. Díaz, M. Fernández, L. Kosmidis, E. Mezzetti, C. Hernandez, J. Abella, and F. J. Cazorla. MC2: Multicore and cache analaysis via deterministic and probability jitter bounding. In ADA-Europe, 2017. Google Scholar
  9. G. Fernandez, J. Abella, E. Quiñones, C. Rochange, T. Vardanega, and F. J. Cazorla. Contention in multicore hardware shared resources: Understanding of the state of the art. In WCET Workshop, 2014. Google Scholar
  10. Cobham Gaisler. LEON4-N2X data sheet and user’s manual. URL: http://www.gaisler.com/doc/LEON4-N2X-DS.pdf.
  11. Cobham Gaisler. NGMP preliminary datasheet version 2.1. URL: http://microelectronics.esa.int/gr740/LEON4-NGMP-DRAFT-2-1.pdf.
  12. Cobham Gaisler. UT699 32-bit fault-tolerant SPARC V8/LEON 3FT processor data sheet. URL: http://www.gaisler.com/doc/gr712rc-datasheet.pdf.
  13. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. In ISCA, 2009. Google Scholar
  14. D. Hardy, T. Piquet, and I. Puaut. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In RTSS, 2009. Google Scholar
  15. N. Ho, I. I. Ashraf, P. Kaufmann, and M. Platzner. Accurate private/shared classification of memory accesses: a run-time analysis system for the LEON3 multi-core processor. In DATE, 2017. Google Scholar
  16. M. Y. Hsiao. A class of optimal minimum odd-weight-column SEC-DED Codes. In IBM Journal of Research and Development, 1970. Google Scholar
  17. International Organization for Standardization. ISO/DIS 26262. Road Vehicles - Functional Safety, 2009. Google Scholar
  18. J. Jalle, M. Fernandez, J. Abella, J. Andersson, M. Patte, L. Fossati, M. Zulianello, and F. J. Cazorla. Bounding resource contention interference in the next-generation microprocessor (NGMP). In ERTS, 2015. Google Scholar
  19. H. Kim, Dionisio de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in COTS-based multi-core systems. In RTAS, 2014. Google Scholar
  20. Chunho Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In MICRO, 1997. Google Scholar
  21. B. Lesage, D. Hardy, and I. Puaut. Shared data caches conflicts reduction for WCET computation in multi-core architectures. In RTNS, 2010. Google Scholar
  22. Y. Li, V. Suhendra, Y. Liang, T. Mitra, and A. Roychoudhury. Timing analysis of concurrent programs running on shared cache multi-cores. In RTSS, 2009. Google Scholar
  23. T. Moseley, J. L. Kihm, D. A. Connors, and D. Grunwald. Methods for modeling resource contention on simultaneous multithreading processors. In IEEE ICCD, 2005. Google Scholar
  24. N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi. CACTI 6.0: A tool to understand large caches. In HP Tech Report HPL-2009-85, 2009. Google Scholar
  25. J. Nowotsch, M. Paulitsch, D.B. Uhler, H. Theiling, S. Wegener, and M. Schmidt. Multi-core interference-sensitive WCET analysis leveraging runtime resource capacity enforcement. In ECRTS, 2014. Google Scholar
  26. NXP. MPC8245 integrated processor hardware specifications. URL: https://www.nxp.com/docs/en/data-sheet/MPC8245EC.pdf.
  27. J. Poovey. Characterization of the EEMBC Benchmark Suite, 2007. Google Scholar
  28. A. Roca, C. Hernandez, M. Lodde, and J. Flich. Area-efficient snoopy-aware NoC design for high-performance chip multiprocessor systems. In Computers &Electrical Engineering, 2015. Google Scholar
  29. S. Rodrigo, J. Flich, J. Duato, and M. Hummel. Efficient unicast and multicast support for CMPs. In MICRO, 2008. Google Scholar
  30. A. Ros and S. Kaxiras. Complexity-effective multicore coherence. In PACT, 2012. Google Scholar
  31. M. Schoeberl. Time-predictable cache organization. In STFSSD, 2009. Google Scholar
  32. Freescale Semiconductor. MPC8548E PowerQUICC III integrated processor hardware specifications. URL: http://cache.freescale.com/files/32bit/doc/data_sheet/MPC8548EEC.pdf.
  33. SoCLib. The soclib project. URL: http://www.soclib.fr/trac/dev.
  34. T. Sondag and H. Rajan. A more precise abstract domain for multi-level caches for tighter WCET analysis. In RTSS, 2010. Google Scholar
  35. STMicroelectronics. STM32F756xx datasheet. URL: http://www.st.com/content/ccc/resource/technical/document/datasheet/fb/d4/56/db/60/61/4f/9c/DM00166114.pdf/files/DM00166114.pdf/jcr:content/translations/en.DM00166114.pdf.
  36. Texas Instruments. TMS570LS09x/07x 16/32-Bit RISC flash microcontroller. URL: http://www.ti.com/lit/ug/spnu607/spnu607.pdf.