Anna Thomas, Jacques Clapach and Karthik Pattabiraman, To appear in the ACM International Workshop on Algorithmic and Application Error Resilience (AER), 2013. [ PDF | Talk ]
Abstract: While hardware errors are on the rise as chip sizes reduce, users of commodity systems expect a near faultless experience with low degradation in performance. Developers tune for higher perfor- mance by enabling compiler optimizations on code, but these opti- mizations affect the resilience of applications, making it difficult to maintain an error resilience guarantee when multiple optimizations are applied together (e.g., -O3 in gcc).
We focus on soft computing applications, (e.g., multimedia ap- plications) that can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free out- comes. We term outcomes that deviate significantly from the error- free outcomes as Egregious Data Corruptions (EDCs). We study how four specific compiler optimizations affect the resilience of soft computing applications. Further, we investigate how the op- timizations affect the detector placement locations for detecting EDC causing faults. This helps us identify safe compiler optimiza- tions that maintain a certain guarantee on the error resilience of the application. Our work is a first step towards identifying the performance-resilience tradeoff space.