Tag Archives: Resilient

BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors

Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu and Sriram Krishnamurthy, , Proceedings of the ACM International Conference on Supercomputing (ICS), 2019. (Acceptance Rate: 23.2 %). [ PDF | Talk ]
Continue reading

Comments Off on BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors

Filed under papers

TensorFI: A Configurable Fault Injector for TensorFlow Applications

Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben, Workshop on Software Certification (WoSoCER), 2018, co-located with the IEEE International Symposium on Software Reliability Engineering (ISSRE). 2018. [ PDF | Talk Slides ] (Code)
Continue reading

Comments Off on TensorFI: A Configurable Fault Injector for TensorFlow Applications

Filed under papers

Modeling Soft-Error Propagation in Programs

Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan, and Timothy Tsai. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. (Acceptance Rate for Regular Papers: 25%) [ PDF | Talk ] (Link to Code) (Best Paper Runner up)
Continue reading

Comments Off on Modeling Soft-Error Propagation in Programs

Filed under papers

Modeling Input Dependent Error Propagation in Programs

Guanpeng Li and Karthik Pattabiraman, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. (Acceptance Rate for Regular Papers: 25%) [PDF | Talk] (Link to Code)
Continue reading

Comments Off on Modeling Input Dependent Error Propagation in Programs

Filed under papers

Understanding Error Propagation in Deep-Learning Neural Networks (DNN) Accelerators and Applications

Guanpeng Li, Siva Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, Stephen Keckler, International Conference for High-Performance Computing, Networking, Storage and Analysis (SC), 2017. (Acceptance Rate: 19%) [PDF | Talk] (Injector code)
Chosen for IEEE Top Picks in Test and Reliability (TPTR), 2023.
Continue reading

Comments Off on Understanding Error Propagation in Deep-Learning Neural Networks (DNN) Accelerators and Applications

Filed under papers

LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures

Bo Fang, Qiang Guan, Nathan Debardeleben, Karthik Pattabiraman, and Matei Ripeanu, ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2017. (Acceptance Rate: 19%) [ PDF | Talk ]

Continue reading

Comments Off on LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures

Filed under papers

One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors

Behrooz Sangchoolie, Karthik Pattabiraman, and Johan Karlsson, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2017. (Acceptance Rate: 23%). [ PDF | Talk ]

Continue reading

Comments Off on One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors

Filed under papers

IPA: Error Propagation Analysis of Multi-threaded Programs Using Likely Invariants

Abraham Chan, Stefan Winter, Habib Saissi, Karthik Pattabiraman and Neeraj Suri. Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST), 2017. (Acceptance Rate: 27%) [PDF | Talk]
Continue reading

Comments Off on IPA: Error Propagation Analysis of Multi-threaded Programs Using Likely Invariants

Filed under papers

Configurable Detection of SDC-Causing Errors in Programs

Qining Lu, Guanpeng Li, Karthik Pattabiraman, Meeta Gupta and Jude Rivers, ACM Transactions on Embedded Computing Systems (TECS). [ PDF ]
Continue reading

Comments Off on Configurable Detection of SDC-Causing Errors in Programs

Filed under papers

Understanding Error Propagation in GPGPU Applications

Guanpeng Li, Karthik Pattabiraman, Chen-Yong Cher and Pradip Bose, International Conference for High-Performance Computing, Storage and Networking (SC), 2016. (Acceptance Rate: 18%) [PDF | Talk ] (Link to LLFI-GPU) Continue reading

Comments Off on Understanding Error Propagation in GPGPU Applications

Filed under papers