Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar K. Iyer, Proceedings of the International Symposium on Pacific-Rim Dependable Computing (PRDC), 2005.
[ PDF File | Talk ]
Abstract: The goal of this study is to provide low-latency detection and prevent error propagation due to value errors. This paper introduces metrics to guide the strategic placement of detectors and evaluates (using fault injection) the coverage provided by ideal detectors embedded at program locations selected using the computed metrics. The computation is represented in the form of a Dynamic Dependence Graph (DDG), a directed-acyclic graph that captures the dynamic dependencies among the values produced during the course of program execution. The DDG is employed to model error propagation in the program and to derive metrics (e.g., value fanout or lifetime) for detector placement. The coverage of the detectors placed is evaluated using fault injections in real programs, including two large SPEC95 integer benchmarks (gcc and perl). Results show that a small number of detectors, strategically placed, can achieve a high degree of detection coverage.