Leonardo Bautista-Gomez, Franck Cappello, Luigi Carro, Nathan DeBardeleben, Bo Fang, Sudhanva Gurumurthi, Karthik Pattabiraman, Paolo Rech, Embedded tutorial, International Symposium on Design, Automation & Test in Europe (DATE’14), Dresden, Germany. [ Paper | Talk ]
Abstract: GPGPUs are increasingly used in several domains, from gaming to different kinds of computationally intensive applications. In many applications GPGPUs reliability is becoming a serious issue and several research activities are focusing on its evaluation. This paper aims at overviewing some major results in the area. First, it shows and analyzes the results of some experiments aiming at assessing the GPGPU reliability in HPC data centers. Secondly, it provides recent results about the reliability of some GPGPUs, derived from radiation experiments. Finally, it describes the characteristics of an advanced fault injection environment allowing one to effectively evaluate the resiliency of applications running on GPGPUs.