Two-Stage Exam: Introduction and Resources

Time to dive in! After thinking about them for a long time, this term I’m converting my exams into two-stage exams.

  • Step 1. I shorten the exam so it’s doable in about 2/3 of the testing time slot.
  • Step 2. Students write the exam individually.
  • Step 3. Students immediately — during the same class period — write the same exam again in groups of 4.
  • Step 4. Grade the exams as usual, but 90% of the score comes from individual, and 10% from team, with a guarantee that if you do better than the team score you get 100% weight for the individual (which very rarely happens, so I’m told).

Why am I making this change?

Four key reasons:

  1. Data. A growing pool of evidence is showing that team tests help students learn. See references below.
  2. Feedback. My classes are very large, so I struggle to give any personalized feedback at all, especially timely feedback. By re-doing the test immediately with peers, they get to immediately discuss the questions and come to the right answer (according to data).
  3. Exam improvement. Based on my evaluations, a small but consistent group of students find my exams very difficult and/or too long. Because I still only have 50 minute classes to work with, this change will force me to shorten my exams, culling and distilling to just the most effective questions that measure deep learning.
  4. Community. I value collaboration and building a supportive community. Research papers and instructors who have used this method report extra benefits beyond learning: students have more rapport with each other and are more willing to participate with their peers in class throughout the term. Also, Gillian Sandstrom and I have a research paper in press showing the more students talk in class, the more they feel like part of a community and interested in the class. So… back to data.

Interested? Here are some quick and effective resources for implementation:

  1. Videos by the CWSEI team depicting Two-Stage Exams in action.
  2. Jones, F., Gilley, B., Harris, S. (2013). Tips for successful two stage exams. The EOS-SEI Times, 6(9). Retrieved
  3. Jones, F., Gilley, B., Lane, E., Caulkins, J., & Harris, S. (2011). Using group exams in your classes. The EOS-SEI Times, 4(1). Retrieved
  4. PHAS-CWSEI Team. (2012). Two-stage (group) exams. CWSEI–PHYS & ASTRO Newsletter. Retrieved
  5. Brett Gilley, aka @ModernHydra


Dahlstrom, O. (2012). Learning during a collaborative final exam. Educational Research and Evaluation: An International Journal on Theory and Practice, 18, 321-332.

Eaton, T. T. (2009). Engaging students and evaluating learning progress using collaborative exams in introductory classes. Journal of Geoscience Education, 57, 113-120.

Gilley, B. H., & Clarkston, B. (2014). Collaborative testing: Evidence of learning in a controlled in-class study of undergraduate students. Journal of College Science Teaching, 43, 83-91.

  • A particularly well-designed example.

Leight, H., Saunders, C., Calkins, R., & Withers, M. (2012). Collaborative testing improves performance but not content retention in a large-enrollment introductory biology class. CBE—Life Sciences Education, 11, 392-401.

  • The title might be alarming here… they showed no effect of the 2-stage exam on final exam performance (compared with material that had been previously tested only with individual tests). I’m ok with this. Not every study is going to find the same effect (particularly ones with some execution oddities like this one), yet this is still a “no-change” effect with no evidence that student learning decreases. Moreover, students still enjoyed the process and found it less stressful than the individual-only tests. No harm done, potential benefits.

Rieger, G. W., & Heiner, C. E. (2014). Examinations that support collaborative learning: The students’ perspective. Journal of College Science Teaching, 43, 41-47.

Roediger, III, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 1155-1159.

  • Two-stage tests might help to fight the negative consequences of MC tests: you remember what you answered (and thought was right), not what actually was right.

Sandstrom, G. M., & Rawn, C. D. (in press/2014). Embrace chattering students: They may be building community and interest in your class. Teaching of Psychology.

Zipp, J. F. (2007). Learning by exams: The impact of two-stage cooperative tests. Teaching Sociology, 35, 62-76. doi: 10.1177/0092055X0703500105




An Excellent TA…

For the past six years, I have asked Psychology’s Teaching Assistants to complete the sentence, “An excellent TA…” both before and after TA Development Day. While completing the program evaluation report for our 2013/2014 TA/TF programming, I pasted the data from almost 200 TAs into Wordle to produce these word clouds. The larger the word, the more frequently it was mentioned. I’m delighted to see student(s) are at the heart of what TAs consider to be excellence!*


The words students/student, course, material, learning, and available jump out to me.

An excellent TA Pretests 2008.2013



The words students, professional, organized, knowledgeable, approachable, learning, course, enthusiastic, and responsible jump out to me. Professional is a quality we try to emphasize very much on TA Day, and it seems like TAs are picking up on that theme.

An excellent TA Posttests 2008.2013

*I’m not trained to interpret qualitative data without converting it to actual numbers. If you are, and see more meaningful themes in here, please feel free to comment below or email me. I’d love to hear your thoughts!

2013/2014 Student Evaluations Response Part 4: Psyc 218

Thank you to each of my students who took the time to complete a student evaluation of teaching this year. I value hearing from each of you, and every year your feedback helps me to become a better teacher. As I explained here, I’m writing reflections on the qualitative and quantitative feedback I received from each of my courses.

This is my second time teaching Psyc 218: Analysis of Behavioural Data, and I must say, I truly love teaching students this course. I didn’t make many changes from the last time, with two exceptions: (1) I attempted to even out the midterm difficulty in response to feedback and self-analysis from last time, (2) I integrated a new reading and treatment of the “New Statistics” debate/movement going on in our discipline right now. I didn’t change the textbook, overall teaching methods (largely lecture punctuated frequently by clicker questions where students practice calculations and interpretation), or the assignments (which we all have to do across sections).


Like last time, quantitative feedback was quite positive, and even a bit higher than last time (e.g., 4.7/5 on average for the “overall efficacy” item, up from 4.4; see the graph above). Qualitative feedback helps me to figure out specifics of what’s going well and what isn’t. Overwhelmingly, the most common comments noted the energy and enthusiasm I bring to class – which was awesome. I really had fun every day, so I’m glad I can help students build positive vibes toward statistics. Quite a few students commented on the high expectations I have for them: indeed, this course is challenging. Some students felt they were appropriately supported to rise to the challenge, whereas others felt pushed a bit too far. Given the differences there, I think the course is probably pitched at an appropriate level, at least when coupled with the way I handled it. Quite a few students asked for more problems to practice with. There are about 30 in the book at the end of each chapter, and every day in class there are clicker questions. However, I know that very few of these problems are at the high level of difficulty I ask for on exams. I wonder if I could offer students a model of how to take a research paper (of which there are a gazillion they could find on their own) and turn it into a problem set. Is there a common set of questions students could ask to help them link the course material to a research article? Hmm. Think more about how to help students learn how to make their own problems/examples. That will be more useful in the long run for the students than me trying to come up with a whole pile of artificial ones.

I was delighted with the number of students who mentioned that incorporating the New Statistics (vs. NHST) framework was motivating and engaging. In the context of discussing these current issues, one student wrote,

“I think that’s really important in terms of training us to be strong, statistically sound researchers. As well, it made me feel like I’m truly part of this field, instead of just being a student taking a course. Overall I would give Dr. Rawn an A+ because she has truly changed the way I think about statistics and about the field in general.”

The identity shift that this student in particular expresses is profoundly important to me. This and similar comments like it bolster my belief that incorporating current issues in statistics and research methods is crucial when training the next generation of psychological scientists. Overall, I’m quite pleased with how this course went, and overall the students seem to be as well.

2013/2014 Student Evaluations Response Part 3: Psyc 217

Thank you to each of my students who took the time to complete a student evaluation of teaching this year. I value hearing from each of you, and every year your feedback helps me to become a better teacher. As I explained here, I’m writing reflections on the qualitative and quantitative feedback I received from each of my courses.


Of all the courses I teach to learners, Research Methods is my oldest. Over the past 7 years I have taught 12 groups of learners (N = 846)! The core design has largely stayed the same, but I have made many changes on the basis of my own reflections, my adapting knowledge of the topic and our discipline, and—crucially—feedback from students. Quantitative feedback from the student evaluations has remained high this year (see the graph below for a comparison across years).


Now that I have used my own textbook for two years, I was able to explore quantitative ratings on the item “How would you rate the contribution of textbook(s) and assigned readings to this course?” As noted in the graph below, there seems to be a small shift favouring Cozby & Rawn, Canadian Edition (M=4.10 across four sections), over Cozby’s older editions (M=3.84 across 8 sections). Is the textbook perceived a bit more positively because it’s a better book or because I’m both an author and the course instructor? To this point, a few students made comments like this: “I really liked that you were the author of the textbook, it helped connect the course to the reading.” This kind of comment makes me wonder if my colleagues’ data show this shift as well. (I need a control series design rather than just an interrupted time series!)

Psyc 217 contribution of textbook

The CozbyRawn textbook is the core text for this course, but I’ve always supplemented the nuts-and-bolts style book with secondary readings. Up until last year, all secondary readings were from the Stanovich text, but they had repeatedly been received poorly by students (see last year’s reflection). This year, I replaced most Stanovich readings with a few articles highlighting major issues being hotly discussed in our field (e.g., replication). Student feedback about this change was quite positive. Some students noted how the new topics/readings helped them understand current issues in the field, whereas others appreciated fewer readings overall and fewer Stanovich ones. I also experienced the change as a productive and helpful one that improved the course. Thanks to past years’ student feedback for triggering that change!

When I step back and look at the overall set of comments, there are some topics mentioned repeatedly. The one criticism that emerged was the midterms: a number of people found them too long and/or difficult. Because the class averages are in the range required by our department, the exam difficulty seems commensurate with student learning in my course. There were also some requests for a study guide, which is interesting because one does exist. I can’t vouch for the quality of it, but it’s available: I’ll make a note to advertise its existence in the syllabus. On the positive side, students found my enthusiasm and approach helpful, including sharing my past mistakes and communicating clear goals/learning objectives. Here’s a comment that sums up some of these themes:

“Who knew research methods could be so interesting? Lectures were consistently energetic and engaging and always promoted critical thinking. I really love that you incorporated contemporary issues in the psychology field into the course content – it has been very useful in interpreting content outside of this class. Continue to do that! Thanks for a great class!”

For next year, I’d like to strive to lecture less and have more in-class activities where people are using the material. I use active learning techniques frequently, but there are some topics that could benefit from revision (quasi-experiments comes to mind). I also intend to revisit the supplementary readings, see if I can replace the few remaining Stanovich chapters with articles available online, and update the “current” readings with ones published since last year (e.g., drawing from an even more recent Perspectives on Psychological Science issue). I’m also working on some bigger things that would span all sections, like adding a big data collection evening, and publishing all the abstracts in some form for next year’s students to be able to refer to.

2013/2014 Student Evaluations Response Part 2: Psyc 208

Thank you to each of my students who took the time to complete a student evaluation of teaching this year. I value hearing from each of you, and every year your feedback helps me to become a better teacher. As I explained here, I’m writing reflections on the qualitative and quantitative feedback I received from each of my courses.


This was another year of relatively major changes in this course. I have revised and revised this course over the years. This year, I needed to replace the readings anyway, so I used it as an opportunity to do a thorough course evaluation. Last summer I carefully considered this course with respect to Dee Fink’s model of course design. In brief, Fink’s model prompts an analysis of the degree to which the learning objectives/course goals, learning assessments, and teaching techniques are integrated with each other. I presented the results of this analysis at a conference (follow this link for the conference presentation support materials). In brief, I learned from this analysis is that my class-by-class learning objectives (and therefore my exams) were really only addressing two of my broader course goals.

Major changes in 2013/2014

  • Revised most readings. Created a new custom set of readings from only one publisher, omitting most of the sport psychology chapters that many students had had trouble connecting with in previous iterations.
  • Revised topic sequence, in-class topics, and exams to align with new content. This meant re-arranging some topics, reframing others, cutting a few entirely, and creating a few new lessons on new topics.
  • Instead of using the Team Based Learning style team tests for two units (one of which was now gone entirely), I created a “Learning Blitz” to serve the same sort of readiness assurance process. In brief, students came with readings prepared, then worked on questions that guided what they were to take from the readings (e.g., keywords, key studies, take-home message). My intent was to help students learn to extract the most important information from readings, while working together.
  • As I said I would in response to last year’s feedback, I created an exam study guide that I distributed to students the week before each exam. It collected all learning objectives, keywords, key studies, etc., together in one place as a sort of “here’s what to know” from class and the readings.
  • The TA who had helped me develop the course over four years graduated. Two new TAs were assigned to my course. They were keen to help support the course, but we did hit some snags.

As you can see, this was a big year in the life of this 208 section. Personally, I felt challenged by the sheer amount of revision needed. When I consider my course design intentions, I think I inched toward integrated assessment and teaching techniques (still lots of room to grow there), and better aligned my course goals and learning objectives with assessments. I also realized just how much work my former TA did to ensure feedback and support was given in a timely way to each group, and to ensure consistency of grading with her fellow TA (which changed most years). I need to be better prepared with a process for communicating more effectively and regularly with TAs, and helping them work together to ensure coordination throughout the grading process.


Quantitative student feedback was on par with previous years (see the graph above, click to enlarge), but qualitative comments tended to hit a different tone. Many students commented positively about how motivated they were to come to class, how much they enjoyed my teaching style and the activities that we did to encourage them to apply the material to their lives. Some students mentioned that I created a “positive learning environment” and was “engaging” and “inspiring.” These comments were consistent with previous years, and I’m glad that many students are finding value in this course and my approach to it.

The suggestions for improvement seemed related to the changes I made, and fell largely into two categories: grading and content. Commonly, students commented that the midterm exam and assignment grading was difficult. There was frustration with the required means – I was frustrated by that too. These means were more salient perhaps than in previous years because of how I handled a couple of things: instead of asking my TAs to revise their grades on an assignment to better align with each other, I scaled them quite explicitly (i.e., one half the class had a +3 boost, the other had a -5 reduction; on the midterm, I scaled +7 for everyone). The midterm difficulty was an overshoot because of the revisions with the new material (not an unusual occurrence). What I wish I had done with the assignment was ask the TAs to take a couple of extra days and revise their grades to come to a common acceptable mean. It would have had the same effect on the grades, but the process would have reduced the salience of the scaling problem. As it stood, half the class seemed to feel like they were punished – when in fact they were simply over-graded initially. Process is crucial. Lesson: Carefully ensure TAs are communicating regularly and are aligned throughout the grading process for the assignment.

The handful of comments on the content surprised me a little. One student mentioned high overlap in the content between this course and some others (although also noted that the applied take on it was new). A few students mentioned that they desired more depth of theory/research and less application. One person phrased it like this, “I know that Dr. Rawn really enjoys research, so I am confident that she teaches us things that has research to support it…. I wish the course focused more on helping [us] understand definitions, and different approaches so I could make connections between material and life myself.” This feedback surprised me. I feel like I am constantly describing studies, but the fact that a few students made similar comments means that maybe this course is starting to come across as preachy (this is how to live a good life and how I apply it and you should too), and, perhaps consequently, less rigorous. I’m not sure what to do with this feedback, but it’s certainly something to think about further.

Other useful suggestions for next year:

  • Clarify and simplify the group project handout. It has been updated each year for a few years, so it reads a bit patchy. Give the rubrics ahead of time. Like last year, I’m nervous about grading – but perhaps use the rubrics as a base to structure the handout.
  • Offer half a lab day about a week before the presentation (maybe cut Lab 3 into two half days?)
  • Have some sort of control over the chaos that is the presentation. Maybe have a bell every 15 minutes – could I bring someone in to do that? I’m busy grading.
  • Shorten the learning blitz requirements: they’re too long for meaningful discussions and some groups are reporting splitting the workload rather than discussing each item together.
  • Consider having pairs of groups – or encouraging even number teams to match with an odd number team — something that helps people meet new people other than their teammates every once in a while.

Thank you to everyone who provided feedback. This course, more than any other I teach, goes through growing pains regularly, and this year felt like a big growing pains year. I have a lot to think about revising as I move forward, and also a lot of success to celebrate. This deliberately unconventional course – although not everyone’s cup of tea – does seem to be reaching a subset of students in a very positive way.

Next Page »

Spam prevention powered by Akismet

This work by Catherine Rawn is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Canada.