Ali Asgari, Florian Geissler, Syed Qutub, Michael Paulitsch, Prashant Nair, and Karthik Pattabiraman, To appear in the Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2023. (Acceptance Rate: 23.9%) [ PDF | Talk ] (code). Artifacts Available and Functional
Abstract: The advent of High Performance Computing has led to the adoption of Convolutional Neural Networks (CNNs) in safety-critical applications such as autonomous vehicles. However, CNNs are vulnerable to DRAM errors corrupting their parameters, thereby degrading their accuracy. Existing techniques for protecting CNNs from DRAM errors are either expensive or fail to protect from large- granularity, multi-bit errors, which occur commonly in DRAMs.
We propose a software-implemented coding scheme, Structural Coding (SC) for protecting CNNs from large-granularity memory errors. SC achieves three orders of magnitude reduction in Silent Data Corruption (SDC) rates of CNNs compared to no protection. Its average error correction coverage is also significantly higher than other software-techniques to protect CNNs from faults in the memory. Further, its average performance, memory, and energy overheads are respectively 3%, 15.71%, and 4.38%. These overheads are much lower than other software protection techniques.