Udit Agarwal, Abraham Chan, and Karthik Pattabiraman, To appear in the Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE), 2023. (Acceptance Rate: 29.5%) [ PDF | Talk ] (Code). Artifacts Available and Reviewed.
Abstract: Large Language Models (LLMs) are transforming the field of natural language processing and revolutionizing the way machines interact with humans. LLMs like ChatGPT and Google’s Bard have already made significant strides in conversational AI, enabling machines to understand natural language and respond in a more human-like manner. In addition to typical applications like sentiment analysis and text generation, LLMs are also used in safety-critical applications such as code generation and speech comprehension in autonomous driving vehicles, where reliability is important.
In this work, we investigate the resilience of LLMs under transient hardware faults. Specifically, we used IR-level fault injection (FI) to assess the reliability of five popular LLMs, including Bert, GPT2, and T5, under transient faults. Moreover, we also investigate how the resilience of LLMs varies with different pre-training, fine-tuning objectives, and the number of encoder and decoder blocks. We find that LLMs are quite resilient to transient faults overall. We also find that the behavior of the LLM under transient faults varies significantly with the input, LLM’s architecture, and the type of task (e.g., translation vs. fill-in-the-blank). Finally, we find that the Silent Data Corruption (SDC) rate varies with different fine-tuning objectives, and for the fill-mask fine-tuning objective, the SDC rate also increases with the model size.