Abstract: Extreme CMOS technology scaling is causing significant concerns in the reliability of computer systems. Intermittent hardware errors are non-deterministic bursts of errors that occur in the same physical location. Recent studies have found that 40% of the processor failures in real-world machines are due to intermittent hardware errors. A study of the effects of intermittent faults on programs is a critical step in building fault-tolerance techniques of reasonable accuracy and cost. In this work, we characterize the impact of intermittent hardware faults in programs using fault-injection campaigns in a microarchitectural processor simulator. We find that 80% of the non-benign intermittent hardware errors activate a hardware trap in the processor, and the remaining 20% cause Silent Data Corruptions (SDCs). We have also investigated the possibility of using the program state at failure time in software-based diagnosis techniques, and found that much of the erroneous data is intact and can be used to identify the source of the error.
email: firstname.lastname@example.orgPhone: 604-827-4245 (please email first)
Address: Rm. 4048, Fred Kaiser Building, 2332 Main Mall, Vancouver, BC V6T1Z4.
- Out of Control: Stealthy Attacks on Robotic Vehicles Protected by Control-Based Techniques
- A Tale of Two Injectors: End-to-End Comparison of IR-level and Assembly-Level Fault Injection
- BinFI: An Efficient Fault Injector for Safety-Critical Machine Learning Systems
- OneOS: IoT Platform based on Posix and Actors
- BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors
- Design-Level and Code-Level Security Analysis of IoT Devices
- Failure Prediction in the Internet of Things due to Memory Exhaustion
- CORGIDS: A Correlation-based Generic Intrusion Detection System
- TensorFI: A Configurable Fault Injector for TensorFlow Applications
- DynPolAC: Dynamic Policy-based Access Control for IoT Systems