{"id":2528,"date":"2014-05-22T21:52:04","date_gmt":"2014-05-23T04:52:04","guid":{"rendered":"https:\/\/blogs.ubc.ca\/karthik\/?p=2528"},"modified":"2014-07-28T10:30:18","modified_gmt":"2014-07-28T17:30:18","slug":"characterizing-the-impact-of-intermittent-hardware-faults-on-programs","status":"publish","type":"post","link":"https:\/\/blogs.ubc.ca\/karthik\/2014\/05\/22\/characterizing-the-impact-of-intermittent-hardware-faults-on-programs\/","title":{"rendered":"Characterizing the Impact of Intermittent Hardware Faults on Programs"},"content":{"rendered":"<p>Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, <a href=\"http:\/\/rs.ieee.org\/transactions-on-reliability.html\">IEEE Transactions on Reliability (TR)<\/a>, In Press (Accepted: May 2014). [ <a href=\"https:\/\/blogs.ubc.ca\/karthik\/files\/2014\/07\/Layali-TR.pdf\">PDF<\/a> ]<br \/>\n<!--more--><\/p>\n<p>Abstract: Extreme CMOS technology scaling is causing significant concerns in the reliability of computer systems. Intermittent hardware errors are non-deterministic bursts of errors that occur in the same physical location. Recent studies have found that 40% of the processor failures in real-world machines are due to intermittent hardware errors. A study of the effects of intermittent faults on programs is a critical step in building fault-tolerance techniques of reasonable accuracy and cost. In this work, we characterize the impact of intermittent hardware faults in programs using fault-injection campaigns in a microarchitectural processor simulator. We find that 80% of the non-benign intermittent hardware errors activate a hardware trap in the processor, and the remaining 20% cause Silent Data Corruptions (SDCs). We have also investigated the possibility of using the program state at failure time in software-based diagnosis techniques, and found that much of the erroneous data is intact and can be used to identify the source of the error.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan, IEEE Transactions on Reliability (TR), In Press (Accepted: May 2014). [ PDF ]<\/p>\n","protected":false},"author":10348,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2267],"tags":[416333,2835,416308,7090,416309],"class_list":["post-2528","post","type-post","status-publish","format-standard","hentry","category-publications","tag-416333","tag-journal","tag-layali","tag-reliability","tag-many-core"],"_links":{"self":[{"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/posts\/2528","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/users\/10348"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/comments?post=2528"}],"version-history":[{"count":7,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/posts\/2528\/revisions"}],"predecessor-version":[{"id":2843,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/posts\/2528\/revisions\/2843"}],"wp:attachment":[{"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/media?parent=2528"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/categories?post=2528"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/karthik\/wp-json\/wp\/v2\/tags?post=2528"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}