Logistics | Objectives | Synopsis | Assessment | Topics | Textbooks | Readings | Announcements |
Logistics:
Class: Tue/Thu 10:30 AM to 12:00 PM at MCLD 207
Office hours: TBD in 4048 KAIS
Back to top
Learning Objectives
- Define common terms such as availability, reliability, dependability etc.
- List common threats to dependability and their mitigation methods
- Solve reliability block diagrams involving series, parallel and networks of components. Apply the laws of discrete probability to evaluating systems.
- Evaluate simple redundancy schemes through the laws of continuous probability, provided the failures are exponentially distributed.
- Apply fault-tolerance techniques such as error correcting circuits and duplicate execution to the design of hardware systems.
- Model systems using Markov models and Stochastic Activity Networks (SAN)
- Apply fault-tolerance techniques such as N-version programming, robust data structures etc. to the design of software systems
- Evaluate the reliability of systems through fault-injections and simulations
- Apply fault-tolerance such as checkpointing and byzantine agreement to the design of parallel and distributed systems
- Critique the design of real-world fault-tolerant systems such as Tandem, ESS
Course Synopsis
This course focuses on the design of fault-tolerant and reliable computer systems. In particular, we will attempt to understand the root causes of faults in computer systems and their impact. We will study both traditional and cutting-edge techniques to provide fault-tolerance and error resilience. Finally, we will explore the practical applications of the techniques in the context of real systems.
An important thread that runs through the course is the evaluation of fault-tolerant systems. To this end, we will study techniques ranging from analytical modeling to empirical validation. The assignments will give you hands-on exposure to cutting edge tools and techniques for dependability evaluation, and will prepare you for the final project. You are encouraged (but not required) to work on a project related to your research interests. The final project constitutes a significant part of the grade.
Back to top
Assessment
Weightage | Component | Comments |
---|---|---|
40% | Project | Four milestones: proposal (5 %), midterm report (10%), final presentation (10%) and final report (15%) |
30% | Assignments | Three assignments each comprising 10% of the grade |
20% | Paper reviews and discussion leading | Approximately two papers each in six sessions (15%), and leading discussion (5%) |
10% | Class participation | In both lectures and discussions |
Topics Covered
Textbooks
There is NO required textbook. However, the following books are recommended:
- D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems – Design and Evaluation, 3rd edition, 1999, A.K. Peters, Limited.
- K. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2nd edition, 2001, John Wiley & Sons.
Paper Readings
Announcements
This is the place for course announcements. Please check for updates.
- Sep 5th: Our first class will be on Sep 6 at 10:30 AM. See you there.
- Sep 8th: Please read chapters 1-2 of the Trivedi book (see textbooks) for next Thursday’s class. Chapters 3 and 4 are optional reading.
- Sep 13th Project proposal documents (2 pages) are due Sep 29th. Please set up a time to talk with me about your project on or before Sep 22nd, or come to my office hours.
- Sep 14th : The papers to be discussed on Sep 22nd have been posted. Instructions on doing the reviews and leading the discussions are posted. Failure to follow the instructions can result in you losing points.
- Sep 17th : Examples of projects are posted here – these are only examples. Please define your own project based on your interests and after discussion with me.
- Sep 21st : The class’s reviews for the first two papers have been posted. Please look at them before class tomorrow.
- Sep 22nd : A couple of clarifications about the review submission. First, if you’re the discussion leader of a session, you do not need to submit a review for either paper in your session. Second, for calculating the total marks for the reviews, I’ll automatically drop your lowest score on one session and consider the rest. So if you don’t turn in reviews for one session, that’ll be considered the one session with the lowest score and be dropped.
- Sep 22nd : Assignment 1 has been posted here. It is due on Oct 13th, Thursday, in class. You are expected to work on it individually and turn in your solutions in written form.
- Sep 28 : The papers to be discussed on Oct 6th have been posted. Please turn in your reviews by Oct 5th noon. Update (Oct 5): The reviews of the class have been posted.
- Oct 6 : I’ve posted a password protected link to the Mobius tool here. Mobius is a tool for modeling systems through Stochastic Activity Networks (SANs). We will use Mobius for our second assignment. We have an academic license for using Mobius in this class. Do not distribute Mobius outside the class.
- Oct 12: I’ve posted the papers to be discussed on Oct 20th. Reviews are due by noon on Oct 19th. Update (Oct 19) : Reviews have been posted.
- Oct 12: Assignment 2 has been posted here. It is due on Nov 3rd, Thursday in class. You will need to install the Mobius tool for solving this assignment (see Oct 6th’s announcement).
Update (Oct 27th): Assignment due date postponed to Nov 8, 2011. - Oct 13: Project midterm reports are due in class on Nov 8th, 2011 . The report must contain the motivation, related work and experimental methodology you plan to deploy in your project, and must be no more than six pages in length including references. The format is IEEE Computer Society double column format available here. Note that each group needs to submit only one report.
Update (Oct 27th): Midterm reports due date postponed to Nov 10th, 2011. - Oct 19: The papers for discussion on Oct 27th have been posted. Reviews are due by noon on Oct 26.
Oct 26 : The reviews for the papers have been posted. - Oct 30 : Many of you have reported trouble with running Mobius under Windows. For this reason, I suggest that you use Mobius in Linux or MacOS X only.
- Oct 31 : We will have a guest lecture on Nov 8th at the regular class time. The speaker will be Charng-da Lu from the center for computational research, SUNY Buffalo. The lecture will be in KAIS 2020 .
- Nov 2: The papers for discussion on Nov 10th have been posted. Reviews are due by noon on Nov 9.
Update(Nov 9) : The reviews have been posted. - Nov 7 : Assignment 3 has been posted here. It is due on December 1st, in class. You’ll also need to download the SymPLFIED framework and the tcas program for this assignment. Please do not distribute SymPLFIED outside this class.
- Nov 9 : The papers for discussion on Nov 17th have been posted. Reviews are due by noon on Nov 16. Update (Nov 16): The reviews have been posted.
- Nov 16 : There will be a project presentation session for this class on December 8th, from 9 AM to Noon. This will take the place of the final exam as there is no separate exam. Each group will present their project in this session (details to follow later). You will be expected to attend and participate in the discussion for the entire duration of the session – please let me know asap if you cannot make any part of the session.
- Nov 16 : There will be no class on Nov 29 and Dec 1. Please use the time to work on your projects. You’ll need to hand in assignment 3 in my office on Dec 1st before noon.
- Nov 16 : We will have a paper discussion session on Nov 24 – the papers have been posted. Reviews are due by noon on Nov 23rd. Update (Nov 23): : The reviews have been posted.
- Nov 17 : Assignment 3 has some corrections. First, when you execute normal.maude in phase 2, you don’t need to specify the scripts subdirectory in its path because you’re already in that directory (thanks to Anna for pointing this out). Also, Mina pointed out this useful link which explains how to install the gcc cross-compiler for SimpleScalar.
- Nov 22 : Because many of you said you had difficulties installing Simplescalar-gcc, I’ve decided to make available the tcas.maude file which you need to run with SymPLFIED to get the results for Assignment 3. In effect, this means that you only need to do phase 3 of assignment 3. I’m also extending the assignment’s due date to Dec 8th, 9 AM.
Update: You also need to copy this file into the tcas directory before the experiment. - Nov 22 : As discussed in class today, the final project reports are due by Dec 20th at noon. These should be at most 10 pages in IEEE Computer Society double column format (same as the midterm reports). You need to email these to me in PDF form with the subject line – EECE513: Project report.
- Nov 24 : Today was the last class in this course. There will be no class next week – please work on your projects during this time. I really enjoyed teaching you all. I’d appreciate it if you can take a few minutes to complete the teaching evaluations for the class. Also, feel free to send any suggestions for improvement directly to me.
- Nov 30 : This is a reminder that we will have the project presentations on Dec 8th from 9 AM to Noon. The presentations will be held in KAIS 4018. Each group will have 15 mins to present their work, at the end of which we’ll have time for questions (we’ll adhere strictly to the 15 minute constraint, so please practice your talk).
I’ll provide coffee and light refreshments for the meeting. Please arrive at 8:45 AM to give yourself time to settle down. I’ll call upon the groups in random order, and if you’re not there at that time, you’ll receive a 0 for the presentation. - Dec 5 : To minimize switching times during the presentation on Dec 8th, please email me your slides in PPT or PDF format on or before Dec 8th morning (6 AM). I’ll load them up on my laptop prior to the meeting. If you absolutely cannot do this, I suggest you show up at 8:45 AM sharp with a memory stick containing the slides.