Category Archives: teacher evaluation

student evaluations of teaching ~ a fundamental problem for higher education

51c8ba4f51e80.imageStudent evaluations of teaching (SET) are standard fare in higher education where there is often relatively little information available to judge what is supposed to be a major part of professors’ work. There are often long end of course surveys students complete, covering everything from whether the content was relevant to the professor’s competence to whether students felt fairly treated. Putting students in a role as observers of what happens in classes isn’t a bad idea, and I’ve written about the Measures of Effective Teaching Project here. Not everything that is asked of students is clearly in their realm though, such as whether the professor knows his/her stuff, and much more care should be given to asking students to observe what is observable and experienced specifically by them. This matters when the data from these surveys are used to make career decisions about professors. The overall evaluation process for tenure, promotion, merit pay, and continuing appointment involve more components than teaching and indeed scholarship (publishing) is often compensatory and can nullify a poor teaching evaluation. But that’s another post, for another day.

A recent study at NC State points to a specific, and serious, problem with student evals of teaching: they are gender biased. One of the basic issues in instrument construction is validity and it would appear student evaluations of teaching aren’t living up to this expectation.

“The ratings that students give instructors are really important, because they’re used to guide higher education decisions related to hiring, promotions and tenure,” says Lillian MacNell, lead author of a paper on the work and a Ph.D. student in sociology at NC State. “And if the results of these evaluations are inherently biased against women, we need to find ways to address that problem.”

Benjamin Schmitt, an assistant professor of history at Northeastern University, who is interested in data visualization has given us what might be the most useful data to see the effects of gender bias. He used Rate My Professor as the data source describing his analysis here, and before you get all uppity about this there is at least some evidence of a reasonably high correlation between ratings on RMP and institutional SETs. Schmitt has created an interactive chart to explore gender, discipline and student ratings. It’s pretty amazing… and eye-opening. I’ve clipped just a few examples when different terms are used to search the data base. Apparently male professors in psychology might have futures as stand up comedians, all education professors are so caring any of them might be the next Mother Teresa, and all philosophy profs are brilliant but men more so than women. Not all descriptors are gender biased… attractive, for example, leads us to believe education professors are the least attractive (ouch!) while the hotties are female criminal justice and male philosophy profs… but a surprisingly consistent pattern of gender bias against women across disciplines is apparent.

search for FUNNY

search for FUNNY

using CARING

using CARING

BRILLIANT as the search term

BRILLIANT as the search term


Explore this interactive chart and be amazed, even appalled. You’ll find that men are more likely to be seen as a star, knowledgeable or awesome and women are more likely to be seen as bossy, disorganized, annoying and to play favourites, whilst being either nicer or ruder than men. (Wait, is that rudely nice OR nicely rude?)

But then ask if the student ratings have any basis in fact. Are philosophy professors really more brilliant than professors in accounting, business or health sciences? Do women across all disciplines lack a sense of humour? In other words, the problem with these ratings are complex and layered… gender bias, for sure; truth value, disputable. And yet, faculty committees and higher ed administrations persist in using such flawed data or at least continue to symbolically offer it up as grounds for important personnel decision making.

alternatives to standardized testing

67547_10151545229449501_624032542_nIn educational evaluation the global educational reform movement (GERM) has privileged common indicators of student learning outcomes (used in turn for other evaluation purposes like teacher evaluation, even if not a sound practice). There are many reasons why standardized tests become the norm and are reified as the only fair and legitimate way to know how students and schools are doing. There is plenty of literature that debunks that idea.

However, the narrative of standardized testing as a necessary and legitimate means of judging the quality of learning and schooling is powerful and political. In a short commentary for NPR a reporter, Anya Kamenetz, nicely summarizes reasonable alternatives, and these are excellent talking points when confronted with the question, “If not standardized tests, what then?” You can read the article, but in summary:

1) use some sort of matrix sampling (a good idea from NAEP)

2) consider ongoing embedded assessments (this is usually computer based testing)

3) think about what you want to know and it will require multiple measures (in other words, knowing scores in a few subject areas will never be enough, and maybe there are things worth knowing beyond the obvious)

4) start considering novel approaches to assessment, like game based assessment and the not so novel use of portfolios or narrative evaluations

5) think differently about what it means to judge a school and that means looking at more than just narrow student outcomes (school inspections are a notion worth revisiting).

The “evaluate that” campaign

I am totally sympathetic with teachers’ reactions to the simplistic, pedestrian ways of evaluating the quality of their work, the quality of student work, and the quality of schools. That efforts are made to reduce complex evaluands to simple ones is a serious problem. The “EVALUATE THAT” campaign identifies important aspects of teaching and education that aren’t measured and therefore not evaluated… things like compassion, empathy, cooperation… the emotional, interactional content of the work of teaching. [Click here, for the heartfelt remarks of one teacher.] The campaign (started by BadAss Teachers who created the meme shown in this post) also suggests these things can’t be measured and can’t be evaluated. Stories are being aggregated with the use of the Twitter hastag #evaluatethat.

Whether you are a teacher, student, parent, administrator… tell us, in a brief sentence or two, YOUR moments of teaching or learning (yours or someone else’s) that was never formally measured but made an impression on you. These ‘bites’ of reality do not have to be all gloriously positive, the only criteria – true, real and not measured (no hypotheticals please).

We are collecting these via Twitter by using #evaluatethat hashtag in each relevant tweet. This will ensure all of these are kept in one place and can be easily seen by all.

The hashtag has taken on a bit of a f*&k you tone… you can sort of imagine the tweeter grabbing their crouch while they shout “EVALUATE THAT.” Even so, the collection of stories is an important reminder of the complexity of teaching and schooling… a complexity that needs to be incorporated into judgements of the quality of teaching, learning and schooling. While it may be very difficult to measure such things as compassion and empathy that’s not a reason to step away, but all the more reason to find sound ways of incorporating those behaviors and actions into evaluations.