Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben, Workshop on Software Certification (WoSoCER), 2018, co-located with the IEEE International Symposium on Software Reliability Engineering (ISSRE). 2018. [ PDF | Talk Slides ] (Code)
Abstract: Machine Learning (ML) applications have emerged as the killer applications for next generation hardware and software platforms, and there is a lot of interest in software frameworks to build such applications. TensorFlow is a high-level dataflow framework for building ML applications and has become the most popular one in the recent past. ML applications are also being increasingly used in safety-critical systems such as self-driving cars and home robotics. Therefore, there is a compelling need to evaluate the resilience of ML applications built using frameworks such as TensorFlow. In this paper, we build a high-level fault injection framework for TensorFlow called TensorFI for evaluating the resilience of ML applications. TensorFI is flexible, easy to use, and portable. It also allows ML application programmers to explore the effects of different parameters and algorithms on error resilience.