Tag Archives: workshop

BlockWatch: Leveraging Similarity in Parallel Programs for Error Detection

Posted on February 13, 2012 by karthikp | Comments Off

Jiesheng Wei and Karthik Pattabiraman, Proceedings of the IEEE Workshop on Silicon Errors in Logic, System Effects (SELSE), 2012. [ PDF File | Talk ]
Continue reading →

Comments Off on BlockWatch: Leveraging Similarity in Parallel Programs for Error Detection

Posted in papers

Tagged 2012, Jiesheng, reliability, Resilient, workshop

DIEBA: Diagnosing Intermittent Errors By BackTracing Application Failures

Posted on February 13, 2012 by karthikp | Comments Off

Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, Proceedings of the IEEE Workshop on Silicon Errors in Logic, System Effects (SELSE), 2012. [ PDF File | Talk ]
Continue reading →

Comments Off on DIEBA: Diagnosing Intermittent Errors By BackTracing Application Failures

Posted in papers

Tagged 2012, Layali, reliability, Resilient, workshop

Comparing the Effects of Transient and Intermittent Faults on Programs

Posted on May 7, 2011 by karthikp | Comments Off

Jiesheng Wei, Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, Workshop on Dependable and Secure Nano-Systems (WDSN), 2011. [ PDF File | Talk ]
Continue reading →

Comments Off on Comparing the Effects of Transient and Intermittent Faults on Programs

Posted in papers

Tagged 2011, Jiesheng, Layali, reliability, Resilient, workshop

Formal Diagnosis of Hardware Transient Errors in Programs

Posted on March 27, 2010 by karthikp | Comments Off

Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, Workshop on Silicon Errors in Logic, System Effects (SELSE), 2010. [ PDF File ][ Talk Slides ]
Continue reading →

Comments Off on Formal Diagnosis of Hardware Transient Errors in Programs

Posted in papers

Tagged 2010, formal, Layali, reliability, workshop

Towards Understanding the Effects of Intermittent Hardware Faults on Programs

Posted on March 14, 2010 by karthikp | Comments Off

Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, Proceedings of the IEEE International Workshop on Dependable and Secure Nano-computing (WDSN), 2010. [ PDF File | Talk ]
Continue reading →

Comments Off on Towards Understanding the Effects of Intermittent Hardware Faults on Programs

Posted in papers

Tagged 2010, Layali, reliability, Resilient, workshop

Automated Derivation and Hardware Implementation of Application-Specific Error Detectors

Posted on November 24, 2009 by karthikp | Comments Off

Karthik Pattabiraman, Giacinto Paolo Saggese, Daniel Chen, Zbigniew Kalbarczyk and Ravishankar Iyer, Workshop on Reliability Issues in High-Performance Computing (HPCRI), 2006.
[ PDF File | Talk ]

Super-ceded by the following conference paper.

Abstract: This paper proposes a novel technique for automated derivation of fine-grained, application-specific error detectors. An algorithm based on dynamic traces of application execution is developed for extracting the optimal set of error detectors for a target application. An automatic framework is proposed for synthesizing the derived detectors in hardware and enabling low-overhead run-time checking of the application execution. Coverage (evaluated using fault injection) of the error detectors obtained using the proposed methodology, the additional hardware resources, and performance overhead for several benchmark programs are also reported.

Comments Off on Automated Derivation and Hardware Implementation of Application-Specific Error Detectors

Posted in papers

Tagged 2006, detectors, reliability, workshop

Position Paper – ToleRace: Tolerating and Detecting Races

Posted on November 24, 2009 by karthikp | Comments Off

Rahul Nagpal, Karthik Pattabiraman, Darko Kirovski and Benjamin Zorn, Second Workshop on Software Tools for Multi-core Systems (STMCS), 2007.
[ PDF File | Talk ]

This paper is super-ceded by the following conference paper

This paper introduces ToleRace, a software tool that increases the reliability of multi-threaded programs by tolerating or detecting race conditions. ToleRace modifies the implementation of critical sections at runtime to provide the following benefits. ToleRace allows programs with certain classes of races to operate as though the race did not exist. ToleRace probabilistically allows programmers to detect many of the remaining races when they happen, with low performance overhead. ToleRace achieves its ability to tolerate and detect races by judiciously duplicating shared data inside a critical section, thereby providing an illusion of atomicity when the shared data is updated. Our early experiments reveal that the performance overhead of ToleRace is considerably lower than existing dynamic race detection tools.

Comments Off on Position Paper – ToleRace: Tolerating and Detecting Races

Posted in papers

Tagged 2007, race, reliability, workshop

Processor-level Selective Replication

Posted on November 23, 2009 by karthikp | Comments Off

Nithin Nakka, Karthik Pattabiraman, Zbigniew Kalbarczyk and Ravishankar Iyer, Workshop on Silicon Errors in Logic- System Effects (SELSE), 2006.
[ PDF File | Talk ]

This paper is superceded by the following conference paper.

Abstract: Even though replication has been widely used in providing fault tolerance, the underlying hardware is unaware of the application executing on it. The application cannot choose to use redundancy for a specific code section and run in a normal, unreplicated mode for the rest of the code. In this paper we propose Processor-level Selective Replication, a mechanism to dynamically configure the degree of instruction-level replication according to the applications demands. The application can choose to replicate only code sections that are critical to its crash-free execution. This decreases the impact on the performance. It is also known that many of the processor-level faults do not lead to failures observable in the application outcome. So, selective replication also decreases the number of false positives.

Comments Off on Processor-level Selective Replication

Posted in papers

Tagged 2006, reliability, Trusted Illiac, workshop

FPGA Hardware Implementation of Statically Derived Error Detectors

Posted on November 23, 2009 by karthikp | Comments Off

Peter Klemperer, Shelley Chen, Karthik Pattabiraman, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Workshop on Dependable and Secure Nanocomputing (WDSN), 2007.
[ PDF File | Talk ]

This paper is superceded by the following conference paper.

Abstract: Previous software-only error detection techniques have provided high-coverage, low-latency detection but suffer significant performance overheads with a large percentage of benign detections. This paper presents a FPGA hardware implementation of application-aware data error detectors. The detectors are automatically derived at compile time and executed in hardware at runtime, minimizing the performance overhead. We implement the static detectors using the Reliability and Security Engine, which provides a standard interface for developing reliability and security hardware modules. An initial, proof-of-concept model shows that there is only a 2% performance penalty when the detectors are implemented in hardware.

Comments Off on FPGA Hardware Implementation of Statically Derived Error Detectors

Posted in papers

Tagged 2007, reliability, Trusted Illiac, workshop

Critical Variable Recomputation for Transient Error Detection

Posted on November 23, 2009 by karthikp | Comments Off

Karthik Pattabiraman, Zbigniew Kalbarcyk and Ravishankar Iyer, Workshop on Silicon Errors in Logic – System Effects (SELSE), 2007.
[ PDF File | Talk ]

This paper is super-ceded by the following conference paper

Abstract: This paper presents a technique to derive and implement error detectors to protect an application from data errors. The error detectors are derived automatically using compiler-based static analysis from the backward program slice of critical variables in the program. Critical variables are defined as those that are highly sensitive to errors, and deriving error detectors for these variables provides high coverage for errors in any data value used in the program. The error detectors take the form of checking expressions and are optimized for each control flow path followed at runtime. The derived detectors are implemented using a combination of hardware and software.

Comments Off on Critical Variable Recomputation for Transient Error Detection

Posted in papers

Tagged 2007, detectors, reliability, workshop

Tag Archives: workshop

BlockWatch: Leveraging Similarity in Parallel Programs for Error Detection

DIEBA: Diagnosing Intermittent Errors By BackTracing Application Failures

Comparing the Effects of Transient and Intermittent Faults on Programs

Formal Diagnosis of Hardware Transient Errors in Programs

Towards Understanding the Effects of Intermittent Hardware Faults on Programs

Automated Derivation and Hardware Implementation of Application-Specific Error Detectors

Position Paper – ToleRace: Tolerating and Detecting Races

Processor-level Selective Replication

FPGA Hardware Implementation of Statically Derived Error Detectors

Critical Variable Recomputation for Transient Error Detection

Recent Papers

Pages