Category Archives: papers

Papers published in peer-reviewed conferences, journals or workshops.

Processor-Level Selective Replication

Nithin Nakka, Karthik Pattabiraman and Ravishankar Iyer, Proceedings of the International Conference on Dependable Systems and Networks (DSN), 2007.
[ PDF File | Talk ]

Abstract: Full duplication of an entire application (through spatial or temporal redundancy) would detect many errors that are benign to the application from the perspective of the end-user. It has also been seen that duplication has upto 30% performance overhead and needs significant introduction of hardware to synchronize the replicas. In order to overcome the drawbacks of performance overhead and detection of “benign” faults, we propose a processor-level technique called Selective Replication, which provides the application the capability to choose where in its application stream and to what degree it requires replication. Recent work on static analysis and fault-injection based experiments on applications reveals that certain variables in the application are critical to its crash- and hang-free execution. If it can be ensured that the computation of these variables is error-free, then a high degree of crash/hang coverage can be achieved at a low performance overhead to the application. The Selective Replication technique provides an ideal platform for validating this claim. The technique is compared against complete duplication as provided in current architectural level techniques. The results show that with about 59% less overhead than full duplication selective replication detects 97% of the data errors and 87% of the instruction errors that were covered by full duplication. It also reduces the detection of errors benign to the final outcome of the application by 17.8% as compared to full duplication.

Application-based Metrics for Strategic Placement of Detectors

Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar K. Iyer, Proceedings of the International Symposium on Pacific-Rim Dependable Computing (PRDC), 2005.
[ PDF File | Talk ]

Abstract: The goal of this study is to provide low-latency detection and prevent error propagation due to value errors. This paper introduces metrics to guide the strategic placement of detectors and evaluates (using fault injection) the coverage provided by ideal detectors embedded at program locations selected using the computed metrics. The computation is represented in the form of a Dynamic Dependence Graph (DDG), a directed-acyclic graph that captures the dynamic dependencies among the values produced during the course of program execution. The DDG is employed to model error propagation in the program and to derive metrics (e.g., value fanout or lifetime) for detector placement. The coverage of the detectors placed is evaluated using fault injections in real programs, including two large SPEC95 integer benchmarks (gcc and perl). Results show that a small number of detectors, strategically placed, can achieve a high degree of detection coverage.

Dynamic Derivation of Application-specific Error Detectors and their Hardware Implementation

Karthik Pattabiraman, Giacinto Paulo Saggese, Daniel Chen, Zbigniew Kalbarczyk and Ravishankar Iyer, Proceedings of the European Dependable Computing Conference (EDCC), 2006. [ PDF File | Talk ]

Abstract: This paper proposes a novel technique for preventing a wide range of data errors from corrupting the execution of applications. The proposed technique enables automated derivation of fine-grained, application-specific error detectors. An algorithm based on dynamic traces of application execution is developed for extracting the set of error detector classes, parameters, and locations in order to maximize the error detection coverage for a target application. The paper also presents an automatic framework for synthesizing the set of detectors in hardware to enable low-overhead run-time checking of the application execution. Coverage (evaluated using fault injection) of the error detectors derived using the proposed methodology, the additional hardware resources needed, and performance overhead for several benchmark programs are also reported.

This paper is super-ceded by the following journal paper.

Automated Derivation of Application-aware Error Detectors using Static Analysis

Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar Iyer, Proceedings of the IEEE International Online Testing Symposium (IOLTS), 2007. [ PDF File | Talk ]

Abstract: This paper presents a technique to derive and implement error detectors to protect an application from data errors. The error detectors are derived automatically using compiler-based static analysis from the backward program slice of critical variables in the program. Critical variables are defined as those that are highly sensitive to errors, and deriving error detectors for these variables provides high coverage for errors in any data value used in the program. The error detectors take the form of checking expressions and are optimized for each control flow path followed at runtime. The derived detectors are implemented using a combination of hardware and software.Experiments show that the derived detectors incur low performance overheads while achieving high detection coverage for errors that impact the application.

This paper is superceded by the following journal paper.

Automated Derivation of Application-aware Error Detectors using Static Analysis: The Trusted Illiac Approach

Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar Iyer. To appear in the Proceedings of the IEEE Transactions on Dependable and Secure Computing (TDSC). (Accepted on May 1, 2009). [ PDF File ]

Continue reading