From Iteration to System Failure: Characterizing the FITness of Periodic Weakly-Hard Systems

Gujarati, Arpan; Nasri, Mitra; Majumdar, Rupak; Brandenburg, Björn B.

doi:10.4230/LIPIcs.ECRTS.2019.9

Abstract

Estimating metrics such as the Mean Time To Failure (MTTF) or its inverse, the Failures-In-Time (FIT), is a central problem in reliability estimation of safety-critical systems. To this end, prior work in the real-time and embedded systems community has focused on bounding the probability of failures in a single iteration of the control loop, resulting in, for example, the worst-case probability of a message transmission error due to electromagnetic interference, or an upper bound on the probability of a skipped or an incorrect actuation. However, periodic systems, which can be found at the core of most safety-critical real-time systems, are routinely designed to be robust to a single fault or to occasional failures (case in point, control applications are usually robust to a few skipped or misbehaving control loop iterations). Thus, obtaining long-run reliability metrics like MTTF and FIT from single iteration estimates by calculating the time to first fault can be quite pessimistic. Instead, overall system failures for such systems are better characterized using multi-state models such as weakly-hard constraints. In this paper, we describe and empirically evaluate three orthogonal approaches, PMC, Mart, and SAp, for the sound estimation of system’s MTTF, starting from a periodic stochastic model characterizing the failure in a single iteration of a periodic system, and using weakly-hard constraints as a measure of system robustness. PMC and Mart are exact analyses based on Markov chain analysis and martingale theory, respectively, whereas SAp is a sound approximation based on numerical analysis. We evaluate these techniques empirically in terms of their accuracy and numerical precision, their expressiveness for different definitions of weakly-hard constraints, and their space and time complexities, which affect their scalability and applicability in different regions of the space of weakly-hard constraints.

BLAS (Basic Linear Algebra Subprograms). URL: http://www.netlib.org/blas/.
Elemental: distributed-memory dense and sparse-direct linear algebra and optimization — Elemental. URL: http://libelemental.org/.
The GNU MPFR Library. URL: https://www.mpfr.org/.
IEC 61158-1:2014 bar IEC Webstore. URL: https://webstore.iec.ch/publication/4624.
LAPACK - Linear Algebra PACKage. URL: http://www.netlib.org/lapack/.
mpmath - Python library for arbitrary-precision floating-point arithmetic. URL: http://mpmath.org/.
Open MPI: Open Source High Performance Computing. URL: https://www.open-mpi.org/.
Robert B. Ash. Basic probability theory. Dover Publications, Mineola, N.Y, dover ed edition, 2008. OCLC: ocn190785258 (pbk.).
Christel Baier and Joost-Pieter Katoen. Principles of model checking. The MIT Press, Cambridge, Mass, 2008. OCLC: ocn171152628.
Guillem Bernat and Alan Burns. Combining (/sub m//sup n/)-hard deadlines and dual priority scheduling. In Proceedings Real-Time Systems Symposium, pages 46-57, San Francisco, CA, USA, 1997. IEEE Comput. Soc. URL: http://dx.doi.org/10.1109/REAL.1997.641268.
Guillem Bernat, Alan Burns, and Albert Liamosi. Weakly hard real-time systems. IEEE Transactions on Computers, 50(4):308-321, April 2001. URL: http://dx.doi.org/10.1109/12.919277.
Rainer Blind and Frank Allgower. Towards Networked Control Systems with guaranteed stability: Using weakly hard real-time constraints to model the loss process. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 7510-7515, Osaka, December 2015. IEEE. URL: http://dx.doi.org/10.1109/CDC.2015.7403405.
Michael S. Branicky, Stephen M. Phillips, and Wei Zhang. Scheduling and feedback co-design for networked control systems. In Proceedings of the 41st IEEE Conference on Decision and Control, 2002., volume 2, pages 1211-1217, Las Vegas, NV, USA, 2002. IEEE. URL: http://dx.doi.org/10.1109/CDC.2002.1184679.
Ian Broster, Alan Burns, and Guillermo Rodriguez-Navas. Timing Analysis of Real-Time Communication Under Electromagnetic Interference. Real-Time Systems, 30(1-2):55-81, May 2005. URL: http://dx.doi.org/10.1007/s11241-005-0504-z.
Marco Caccamo and Giorgio Buttazzo. Exploiting skips in periodic tasks for enhancing aperiodic responsiveness. In Proceedings Real-Time Systems Symposium, pages 330-339, San Francisco, CA, USA, 1997. IEEE Comput. Soc. URL: http://dx.doi.org/10.1109/REAL.1997.641294.
Hyunjong Choi, Hyoseung Kim, and Qi Zhu. Job-Class-Level Fixed Priority Scheduling of Weakly-Hard Real-Time Systems. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Montreal, Quebec, Canada, April 2019. IEEE. URL: http://dx.doi.org/10.1109/RTAS.2019.00028.
Hoon Sung Chwa, Kang G. Shin, and Jinkyu Lee. Closing the Gap Between Stability and Schedulability: A New Task Model for Cyber-Physical Systems. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 327-337, Porto, April 2018. IEEE. URL: http://dx.doi.org/10.1109/RTAS.2018.00040.
Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. A Storm is Coming: A Modern Probabilistic Model Checker. In Computer Aided Verification, pages 592-600, Cham, 2017. Springer International Publishing.
Joanne Bechta Dugan and Randy Van Buren. Reliability evaluation of fly-by-wire computer systems. Journal of Systems and Software, 25(1):109-120, April 1994. URL: http://dx.doi.org/10.1016/0164-1212(94)90061-2.
Oliver Gettings, Sophie Quinton, and Robert I. Davis. Mixed criticality systems with weakly-hard constraints. In Proceedings of the 23rd International Conference on Real Time and Networks Systems - RTNS '15, pages 237-246, Lille, France, 2015. ACM Press. URL: http://dx.doi.org/10.1145/2834848.2834850.
Arpan Gujarati, Mitra Nasri, and Björn B. Brandenburg. Lower-Bounding the MTTF for Systems with (m, k) Constraints and IID Iteration Failure Probabilities. Technical Report MPI-SWS-2018-004, Max Planck Insitute for Software Systems, April 2018. URL: https://www.mpi-sws.org/tr/2018-004.pdf.
Arpan Gujarati, Mitra Nasri, and Björn B. Brandenburg. Quantifying the Resiliency of Fail-Operational Real-Time Networked Control Systems. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018), volume 106 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1-16:24, Barcelona, Spain, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/lipics.ecrts.2018.16.
Arpan Gujarati, Mitra Nasri, Rupak Majumdar, and Björn B. Brandenburg. From Iteration to System Failure: Characterizing the FITness of Periodic Weakly-Hard Systems. Technical Report MPI-SWS-2019-001, Max Planck Insitute for Software Systems, Germany, May 2019. URL: https://www.mpi-sws.org/tr/2019-001.pdf.
Moncef Hamdaoui and Parameswaran Ramanathan. A dynamic priority assignment technique for streams with (m, k)-firm deadlines. IEEE Transactions on Computers, 44(12):1443-1451, December 1995. URL: http://dx.doi.org/10.1109/12.477249.
Rafik Henia, Arne Hamann, Marek Jersak, Razvan Racu, Kai Richter, and Rolf Ernst. System level performance analysis – the SymTA/S approach. IEE Proceedings - Computers and Digital Techniques, 152(2):148, 2005. URL: http://dx.doi.org/10.1049/ip-cdt:20045088.
Chao Huang, Wenchao Li, and Qi Zhu. Formal verification of weakly-hard systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems Computation and Control - HSCC '19, pages 197-207, Montreal, Quebec, Canada, 2019. ACM Press. URL: http://dx.doi.org/10.1145/3302504.3311811.
Chao Huang, Kacper Wardega, Wenchao Li, and Qi Zhu. Exploring weakly-hard paradigm for networked systems. In Proceedings of the Workshop on Design Automation for CPS and IoT - DESTION '19, pages 51-59, Montreal, Quebec, Canada, 2019. ACM Press. URL: http://dx.doi.org/10.1145/3313151.3313165.
Anastasiia Izycheva and Eva Darulova. On Sound Relative Error Bounds for Floating-point Arithmetic. In Proceedings of the 17th Conference on Formal Methods in Computer-Aided Design, FMCAD '17, pages 15-22, Austin, TX, 2017. FMCAD Inc. URL: http://dl.acm.org/citation.cfm?id=3168451.3168462.
Matthias Kauer, Damoon Soudbakhsh, Dip Goswami, Samarjit Chakraborty, and Anuradha M. Annaswamy. Fault-tolerant Control Synthesis and Verification of Distributed Embedded Systems. In Proceedings of the Conference on Design, Automation & Test in Europe, DATE '14, pages 56:1-56:6, 3001 Leuven, Belgium, Belgium, 2014. European Design and Automation Association. URL: http://dl.acm.org/citation.cfm?id=2616606.2616675.
Way Kuo and Ming J. Zuo. Optimal reliability modeling: principles and applications. John Wiley &Sons, Hoboken, N.J, 2003.
Marta Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of Probabilistic Real-Time Systems. In Computer Aided Verification, volume 6806, pages 585-591. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22110-1_47.
Shuo-Yen Robert Li. A Martingale Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments. The Annals of Probability, 8(6):1171-1176, 1980. URL: https://www.jstor.org/stable/2243018.
Rupak Majumdar, Indranil Saha, and Majid Zamani. Performance-aware scheduler synthesis for control systems. In Proceedings of the 9th ACM international conference on Embedded software - EMSOFT '11, page 299, Taipei, Taiwan, 2011. ACM Press. URL: http://dx.doi.org/10.1145/2038642.2038689.
Paolo Pazzaglia, Luigi Pannocchi, Alessandro Biondi, and Marco Di Natale. Beyond the Weakly Hard Model: Measuring the Performance Cost of Deadline Misses. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018), volume 106 of Leibniz International Proceedings in Informatics (LIPIcs), pages 10:1-10:22, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/LIPIcs.ECRTS.2018.10.
Hoang Pham. Optimal design of k-out-of-n redundant systems. Microelectronics Reliability, 32(1):119-126, January 1992. URL: http://dx.doi.org/10.1016/0026-2714(92)90091-X.
Gang Quan and Xiaobo Hu. Enhanced fixed-priority scheduling with (m,k)-firm guarantee. In Proceedings 21st IEEE Real-Time Systems Symposium, pages 79-88, November 2000. URL: http://dx.doi.org/10.1109/REAL.2000.895998.
Sophie Quinton and Rolf Ernst. Generalized Weakly-Hard Constraints. In Leveraging Applications of Formal Methods, Verification and Validation. Applications and Case Studies, Lecture Notes in Computer Science, pages 96-110. Springer Berlin Heidelberg, 2012.
Parameswaran Ramanathan. Overload management in real-time control applications using (m, k)-firm guarantee. IEEE Transactions on Parallel and Distributed Systems, 10(6):549-559, June 1999. URL: http://dx.doi.org/10.1109/71.774906.
Enno Ruijters and Mariëlle Stoelinga. Fault tree analysis: A survey of the state-of-the-art in modeling, analysis and tools. Computer Science Review, 15-16:29-62, February 2015. URL: http://dx.doi.org/10.1016/j.cosrev.2015.03.001.
Raaj K. Sah. An explicit closed-form formula for profit-maximizing k-out-of-n systems subject to two kinds of failures. Microelectronics Reliability, 30(6):1123-1130, January 1990. URL: http://dx.doi.org/10.1016/0026-2714(90)90291-T.
Indranil Saha, Sanjoy Baruah, and Rupak Majumdar. Dynamic Scheduling for Networked Control Systems. In Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control, HSCC '15, pages 98-107, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2728606.2728636.
Michael Sfakianakis, Stratis G. Kounias, and Alexander E. Hillaris. Reliability of a consecutive k-out-of-r-from-n:F system. IEEE Transactions on Reliability, 41(3):442-447, September 1992. URL: http://dx.doi.org/10.1109/24.159817.
Purnendu Sinha. Architectural design and reliability analysis of a fail-operational brake-by-wire system from ISO 26262 perspectives. Reliability Engineering &System Safety, 96(10):1349-1359, October 2011. URL: http://dx.doi.org/10.1016/j.ress.2011.03.013.
Fedor Smirnov, Michael Glaß, Felix Reimann, and Jürgen Teich. Formal reliability analysis of switched ethernet automotive networks under transient transmission errors. In Proceedings of the 53rd Annual Design Automation Conference on - DAC '16, pages 1-6, Austin, Texas, 2016. ACM Press. URL: http://dx.doi.org/10.1145/2897937.2898026.
Damoon Soudbakhsh, Linh T. X. Phan, Oleg Sokolsky, Insup Lee, and Anuradha Annaswamy. Co-design of Control and Platform with Dropped Signals. In Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems, ICCPS '13, pages 129-140, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2502524.2502542.
Susan Stanley. MTBF, MTTR, MTTF & FIT Explanation of Terms. URL: http://www.bb-elec.com/Learning-Center/All-White-Papers/Fiber/MTBF,-MTTR,-MTTF,-FIT-Explanation-of-Terms/MTBF-MTTR-MTTF-FIT-10262012-pdf.pdf.

From Iteration to System Failure: Characterizing the FITness of Periodic Weakly-Hard Systems

Authors Arpan Gujarati, Mitra Nasri, Rupak Majumdar, Björn B. Brandenburg

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

From Iteration to System Failure: Characterizing the FITness of Periodic Weakly-Hard Systems

Authors Arpan Gujarati, Mitra Nasri, Rupak Majumdar, Björn B. Brandenburg

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message