Tag Archives: Resilient

A Tale of Two Injectors: End-to-End Comparison of IR-level and Assembly-Level Fault Injection

Lucas Palazzi, Guanpeng Li, Bo Fang, and Karthik Pattabiraman, IEEE International Symposium on Software Reliability Engineering (ISSRE), 2019. (Acceptance Rate: 31.4%) [ PDF | Talk ] (code)
Continue reading

BinFI: An Efficient Fault Injector for Safety-Critical Machine Learning Systems

Zitao Chen, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2019. (Acceptance Rate: 21%) [ PDF | Talk ] ( Code Finalist for the SC reproducibility challenge (one of 3 papers))
Continue reading

BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors

Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu and Sriram Krishnamurthy, , Proceedings of the ACM International Conference on Supercomputing (ICS), 2019. (Acceptance Rate: 23.2 %). [ PDF | Talk ]
Continue reading

TensorFI: A Configurable Fault Injector for TensorFlow Applications

Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben, Workshop on Software Certification (WoSoCER), 2018, co-located with the IEEE International Symposium on Software Reliability Engineering (ISSRE). 2018. [ PDF | Talk Slides ] (Code)
Continue reading

Modeling Soft-Error Propagation in Programs

Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan, and Timothy Tsai. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. (Acceptance Rate for Regular Papers: 25%) [ PDF | Talk ] (Link to Code) (Best Paper Runner up)
Continue reading

Modeling Input Dependent Error Propagation in Programs

Guanpeng Li and Karthik Pattabiraman, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. (Acceptance Rate for Regular Papers: 25%) [PDF | Talk] (Link to Code)
Continue reading

Understanding Error Propagation in Deep-Learning Neural Networks (DNN) Accelerators and Applications

Guanpeng Li, Siva Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, Stephen Keckler, International Conference for High-Performance Computing, Networking, Storage and Analysis (SC), 2017. (Acceptance Rate: 19%) [PDF | Talk] (Injector code)
Chosen for IEEE Top Picks in Test and Reliability (TPTR), 2023.
Continue reading

LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures

Bo Fang, Qiang Guan, Nathan Debardeleben, Karthik Pattabiraman, and Matei Ripeanu, ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2017. (Acceptance Rate: 19%) [ PDF | Talk ]

Continue reading

IPA: Error Propagation Analysis of Multi-threaded Programs Using Likely Invariants

Abraham Chan, Stefan Winter, Habib Saissi, Karthik Pattabiraman and Neeraj Suri. Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST), 2017. (Acceptance Rate: 27%) [PDF | Talk]
Continue reading

One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors

Behrooz Sangchoolie, Karthik Pattabiraman, and Johan Karlsson, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2017. (Acceptance Rate: 23%). [ PDF | Talk ]

Continue reading