Lucas Palazzi, Guanpeng Li, Bo Fang, and Karthik Pattabiraman, IEEE Transactions on Dependable and Secure Computing (TDSC). (Acceptance date: March 2020). [PDF] (Code)
This paper is an extended version of the following conference paper
Abstract: Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of software techniques for tolerating hardware faults. Software-implemented FI can be performed at different levels of abstraction in the system stack; FI performed at the compiler’s intermediate representation (IR) level has the advantage that it is closer to the program being evaluated and is hence easier to derive insights from for the design of software fault-tolerance mechanisms. Unfortunately, it is not clear how accurate IR-level FI is vis-a-vis FI performed at the assembly code level, and prior work has presented contradictory findings. In this paper, we perform a comprehensive evaluation of the accuracy of IR-level FI across a range of benchmark programs and compiler optimization levels. Our results show that IR-level FI is as accurate as assembly-level FI for silent data corruption (SDC) probability estimation across different benchmarks and optimization levels. Further, we present a machine-learning-based technique for improving the accuracy of crash probability measurements made by IR-level FI, which takes advantage of an observed correlation between program crash probabilities and instructions that operate on memory address values. We find that the machine learning technique provides comparable accuracy for IR-level FI as assembly code level FI for program crashes.
- Systematically Assessing the Security Risks of AI/ML-enabled Connected Healthcare Systems
- ImmunoPlane: Middleware for Providing Adaptivity to Distributed Internet-of-Things Applications
- Diagnosis-guided Attack Recovery for Securing Robotic Vehicles from Sensor Deception Attacks
- Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction
- Characterizing and Improving Resilience of Accelerators to Memory Errors in Autonomous Robots
- EdgeEngine: A Thermal-Aware Optimization Framework for Edge Inference
- Evaluating the Effect of Common Annotation Faults on Object Detection Techniques
- Resilience Assessment of Large Language Models under Transient Hardware Faults
- Mixed Precision Support in HPC Applications: What About Reliability?
- Towards Reliability Assessment of Systolic Arrays against Stuck-at Faults
- About us