Tag Archives: 2006

Automated Derivation and Hardware Implementation of Application-Specific Error Detectors

Karthik Pattabiraman, Giacinto Paolo Saggese, Daniel Chen, Zbigniew Kalbarczyk and Ravishankar Iyer, Workshop on Reliability Issues in High-Performance Computing (HPCRI), 2006.
[ PDF File | Talk ]

Super-ceded by the following conference paper.

Abstract: This paper proposes a novel technique for automated derivation of fine-grained, application-specific error detectors. An algorithm based on dynamic traces of application execution is developed for extracting the optimal set of error detectors for a target application. An automatic framework is proposed for synthesizing the derived detectors in hardware and enabling low-overhead run-time checking of the application execution. Coverage (evaluated using fault injection) of the error detectors obtained using the proposed methodology, the additional hardware resources, and performance overhead for several benchmark programs are also reported.

Comments Off on Automated Derivation and Hardware Implementation of Application-Specific Error Detectors

Filed under papers

Processor-level Selective Replication

Nithin Nakka, Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar Iyer, Workshop on Silicon Errors in Logic- System Effects (SELSE), 2006.
[ PDF File | Talk ]

This paper is superceded by the following conference paper.

Abstract: Even though replication has been widely used in providing fault tolerance, the underlying hardware is unaware of the application executing on it. The application cannot choose to use redundancy for a specific code section and run in a normal, unreplicated mode for the rest of the code. In this paper we propose Processor-level Selective Replication, a mechanism to dynamically configure the degree of instruction-level replication according to the applications demands. The application can choose to replicate only code sections that are critical to its crash-free execution. This decreases the impact on the performance. It is also known that many of the processor-level faults do not lead to failures observable in the application outcome. So, selective replication also decreases the number of false positives.

Comments Off on Processor-level Selective Replication

Filed under papers

Dynamic Derivation of Application-specific Error Detectors and their Hardware Implementation

Karthik Pattabiraman, Giacinto Paulo Saggese, Daniel Chen, Zbigniew Kalbarczyk and Ravishankar Iyer, Proceedings of the European Dependable Computing Conference (EDCC), 2006. [ PDF File | Talk ]

Abstract: This paper proposes a novel technique for preventing a wide range of data errors from corrupting the execution of applications. The proposed technique enables automated derivation of fine-grained, application-specific error detectors. An algorithm based on dynamic traces of application execution is developed for extracting the set of error detector classes, parameters, and locations in order to maximize the error detection coverage for a target application. The paper also presents an automatic framework for synthesizing the set of detectors in hardware to enable low-overhead run-time checking of the application execution. Coverage (evaluated using fault injection) of the error detectors derived using the proposed methodology, the additional hardware resources needed, and performance overhead for several benchmark programs are also reported.

This paper is super-ceded by the following journal paper.

Comments Off on Dynamic Derivation of Application-specific Error Detectors and their Hardware Implementation

Filed under papers