TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications
Improving the Accuracy of IR-Level Fault Injection
A Tale of Two Injectors: End-to-End Comparison of IR-level and Assembly-Level Fault Injection
BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors
LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures
ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis
A Systematic Methodology for Evaluating the Error Resilience of GPGPU Applications
Talk: Tolerating Silent Data Corruption (SDC) causing Hardware Faults Through Software Techniques
Evaluating the Error Resilience of Parallel Programs
GPGPUs: How to Combine High Computational Power with High Reliability