Abraham Chan, Udit Agarwal, and Karthik Pattabiraman. To appear at the IEEE International Workshop on Software Certification (WoSoCER’21), co-held with the IEEE International Symposium on Software Reliability Engineering (ISSRE), 2021. [ PDF | Talk ] (Code)
Abstract: As machine learning (ML) has become more prevalent across many critical domains, so has the need to understand ML system resilience. While previous work has focused on building ML fault injectors at the application level, there has been little work enabling fault injection of ML applications at a lower level. We present LLTFI, a tool under development, which allows users to run fault injection experiments on C/C++, TensorFlow and PyTorch applications at the LLVM IR level. LLTFI provides users with greater fault injection granularity and a better ability to understand how faults manifest and propagate between programmed and ML components. We demonstrate how LLTFI can be applied to a ML application with an end-to-end example.