Uncategorized

Scaling Down

This practice doesn’t seem to be used at all in science, but occurs all the time in psychology, where they want there to be a set average for all classes. I’m pretty vehemently opposed to scaling people down on exams, and I think most students agree. However, it’s still used, and I wanted to really look into the reasons both for and against it.

It seems that the main purpose of the practice is to prevent mark inflation. If you let students do well, that just means that every year the marks will just go up further, and soon enough, everyone will be getting 100% on exams (or at least an 85), and standards across Canadian schools will be shattered as everyone scrambles to give students higher marks to guarantee their entry into graduate programs. Chaos will ensue as schools try to identify which students are actually good at the subject, undeserving students will get in at the cost of the capable ones, the genius will be lost among the mass of mediocrity. Or will that actually happen?

Well, first of all, under what circumstances do professors scale marks down?

1.) The exam was too easy, and too many people did well.
-I would argue that this is the fault of the professor. If you design any sentient being that can read and understand English and can apply basic principles of logic to deduce correct answers, it’s a problem with the exam. Ok, I know this is an exaggeration, but still- why are we punishing students for the mistake of a professor?

2.) The exam was hard, but students studied hard and did well.
-There are classes where students know the material is going to be hard. There are also classes where there there just happen to be lots of good students (honours classes, for instance), who might study hard and do well. It does happen- and when it does, I don’t think those students should be scaled down, as they could just legitimately know and understand the material better.

I would argue that the doomsday scenario profs envision won’t actually materialize.

First of all, there’s nothing to say that a high average one year, or even on one exam, will create this upward spiral of grading. Just because one year’s class happens to do well doesn’t meant that this is going to be an on-going trend. Setting the class average to be 65% every year is simply not accounting for natural variation among the student population. You might get a brilliant group of students one year, and a terrible group of students the next year. If the aim is to compare students from year to year, then setting a set average doesn’t do much in terms of establishing a scale of comparison, as the average is so sensitive to individual marks.

I also don’t see anything wrong with people doing well on an exam that’s designed to test their knowledge of the material (i.e. a fair exam). Theoretically, in a good class, students will study hard for an exam, and do well as a result. Sure, not everyone will study- and those people who don’t will do worse, and score below average- and also probably do significantly (p<0.05) worse. Even if the instructor did compose an exam that could have been written by a student who never attended class or did any of the readings and just wrote the exam based on previous knowledge- scaling down only penalizes the people who did do well and who studied hard. Sure, it might be unfair to the person who studied lots and did as well as the person who studied little- but that happens even on difficult exams- and it's still unfair to scale the hardworking person down. Plus, if you are scaled down because the exam wasn't hard enough, there's nothing to say that you would have done just as well on a harder exam. There's also always the opportunity to make the next exam slightly more difficult so that you actually test students' learning and understanding of the material rather than things like previous knowledge, if this was the initial problem. In the case that the average is 'too high' on the harder exam, I would argue that students who perform well on a it still deserve good mark, regardless of what the average was 'supposed' to be. Considering the fact that most exams are quite similar from year to year, it makes no sense to scale a class down on an exam that achieved the targeted average in previous years- all the high mark indicates is that students learned the material well. Scaling down then simply makes the marking unjust, and removes motivation to study, because at that point your mark and your effort in the course are no longer correlated, as the mark you get for the course is not actually indicative of the effort or knowledge you attained over the term, but rather is indicative of the average your professor wanted the class to attain.

Then there’s the second issue- not being able to separate the genius from the simply smart from the average if everyone ends up doing well on an exam. My first qualm with this belief is that I just don’t think there are all that many geniuses floating around, and those who are indeed of superior intelligence will be able to prove themselves in some other way (show their brilliance through other projects, or during their grad school interviews, or through reference letters, etc.). But more importantly, if you’re scaling everyone down, you’re supposedly scaling the geniuses down as well, no? Or if you’re not using a uniform scale, and giving the person who got a 98 a 92, but letting the person who got 100 keep that mark- you’re essentially saying that the person who got 100 is smarter than the person who got a 98. In this case, though, the initial difference could have been a matter of one student getting one more question correct, and that could be the result of a random guess and simple luck rather than knowledge of the subject matter. So really, if we’re trying to separate geniuses from a group, we should devise a matter of doing so that doesn’t involve punishing everyone else in the group, and that relies less on things like chance and luck. I’d also argue that tests shouldn’t be targeted at the 0.00001% of us who are brilliant, but that’s a bit of a tangent to be written about some other day.

So why do profs scale down? Why have a target grade? Presumably to ensure that the average is consistent across all schools, or to prevent high marks from becoming meaningless- to which I have several responses. First of all, unless there’s some sort of pact between schools or departments to set an average, there is nothing to prevent one school from deciding to give all their students marks of 80% and up for any given course. A good application review system will notice this, and probably judge applications based on either other factors, or else by looking at how the student did compared to the class average. Secondly, this notion of marks becoming meaningless when they’re too high- and I’d argue that no, this isn’t the case. I find that on the whole, it’s not all that difficult to do well in a psychology class, if you actually a.) come to class b.) listen in class (This seems to be a problem for some, which is why b.) is its own category. There are some people who like to learn through osmosis and come to class and sleep.) c.) do the reading and d.) do some studying (i.e. memorize some material and understand it). This normally, provided that the person understands and knows the material, is guaranteed to get the student at least a 80 (I think. I haven’t done the stats or anything, so this bit is more of a conjecture. But I think that in psych, at the very least, it must be somewhat true, although I don’t know error margins). In this case, I don’t think the mark is meaningless- I think it’s just something the student is proud of. Also, this doesn’t apply to the GPA booster courses, where you’re asked things that seem to be common knowledge (Believe it or not, things like “does the Sun revolve around the Earth?”. Answer at the bottom of the page, if you are unsure). But even in those courses, people som
ehow manage to fail the exam, even without the prof scaling down… It is not unfathomable, however, that everyone engages in these 4 sequences. It shouldn’t be a surprise, then, when students do well. Low averages are based on the preconceived notion that students will slack off, not study, or be generally unintelligent- all of which I would dispute. So why base expectations on preconceived notions that are bound to fail every once in a while?

I think my argument is pretty clear- maybe not as clear as it would be had I written this at any decent hour, but clear enough nonetheless. Don’t scale down. I’d further propose that marks in general aren’t always indicative of either learning or achievement in the class. I’d also argue and that marks on transcripts should be given in relation to the average class mark- this allows application reviewers to really judge how well the student is doing in a class. If the transcript indicates that the student scored an 85 on an exam where the average was 80, that something different about the student than if he/she scored an 85 where the class average was 60. I’d be interested in seeing some of these ideas argued, at the very least. They might certainly reduce the problems people have with mark inflation, and might serve as better indicators of students’ performance- and that, unless I’m hugely mistaken, is what the marking system is trying to ensure in the first place.

Oh, and the answer is no. The Earth revolves around the sun. For the physicists out there, I know this is probably a simplification, and if you look at the world in 16 dimensions, you can arguably say that there’s some sort of complex pattern of movement where the two bodies, based on Someone’s Law, revolve around one another, or something along those lines. But for our purposes, “no” is the answer.

By Neal Yonson

Neal Yonson is a native of Ottawa. He graduated from the University of Toronto with an Hon.B.Sc. in 2006 and will be happy to tell you about how things seemed to work more smoothly there. After traveling across the country for free (protip: strategically arrange grad school visits where they reimburse travel) he came to UBC to start a Ph.D. in chemistry. He was quite happy to avoid student issues until he found out how much it cost to go to the BirdCoop. Since then, he has been involved with a variety of advocacy projects.

View Archive

8 replies on “Scaling Down”

As a graduate student I grapple with a version of this problem regularly in marking undergraduate tests and papers. I don’t scale down per se, but I do pay a lot of attention to making sure the marks for the class come out within a fairly narrow range. I can tell you right now that the average for the course I’m TAing this fall will be 70+-2 percent or so. The average on each individual test will be ~70% as well. The instructor would get a talking to from the faculty if this was not the case. When I mark tests I start by reading a dozen or so to get an idea of the quality of the answers and put the tests in rank order. Then I basically say the best test was an A and work from there, with some deviations for exceptionally good and exceptionally bad work. So I don’t scale down, but the effect is the same.

If you grant that marking is a worthwhile endeavor AT ALL, then I think this is the only way to go. There is no objective way of determining whether this year’s test is harder than last year’s, if this batch of students is smarter than some previous batch, or if this professor is a better teacher than that professor. The most reasonable assumption is that each batch of students are roughly equal.

You are correct that if the test is too easy, that is the professor’s mistake. However, it is a mistake that needs to be rectified. It is not “punishing” students if the students end up with a mark that is equal to what they would have received if they had written a test that was more difficult. It is simply correcting the professor’s error.

Under the current university paradigm, grades are the primary determinate of admission to graduate school, professional programs, etc. Grades need to be comparable from year-to-year to some degree, because grad schools are not interested in your “absolute” level of knowledge or skill, but your level compared to other students. It is much easier for them to compare if grades remain constant.

An alternative could be to just rank students in the class and not grade them. Then you could say you were in the top 5% or whatever and scaling would never be an issue. I think India does something like this. But it all amounts to the same thing, except maybe for your ego or your parents.

Personally I think it’s all a lot of hogwash, but under the current system curving or scaling is the only way to go.

I have no idea what my colleagues in Arts do, but I do know what kind of scaling goes on in a large, multi-section course in some parts of Science. In Mathematics, for example, we often use the common final exam as a normalizer: in order to ensure general fairness and to “correct” for the occasional midterm that is either too hard or too easy, we scale an individual class’s term marks to match that classes exam mark distribution. Unless the students had Prof. Goodtimes, who gave them all 90s, this sort of scaling usually goes up, not down, in my experience. In general, students get the benefit of the doubt, particularly when it comes to deciding a pass or fail.

I know of other first year science courses where all sections are expected to have the same average. I find this puzzling since the evidence I see points to variations between sections due to the use of previous grades to determine registration date: some sections just have better students, on average. However, my colleagues in charge of those course seem to have chosen a path that leads to the fewest student complaints about fairness.

The issue here is not some mythical need on the part of professors to knock students down. In many cases, professors are trying to mitigate differences between themselves in an attempt to be fair. At the same time, we attempt to have the grades communicate something about your learning. An A should mean mastery of the material. A B should mean solid effort with a good level of understanding. A C should mean you have covered the basics, but probably not to a very deep level. A D is generally a bare pass, and an F means you haven’t even mastered the basic elements of the course.

It is interesting for me that some students seem willing to spend hours with me (were I to I let them) discussing the 1 or 2 marks they lose on a question on a midterm, but these same students fail to ever make an appearance to discuss the actual content of the course.

Of course, ultimately, I would be happier with a grading system that only used letters. Imagining that I can assess a student to +/- 1 mark and have it mean anything is ludicrous. Perhaps I could give A, B, C, D, etc., but maybe I’d be more comfortable with Pass, Fail, First (and yes, I feel confident I can tell when a student has actually mastered something). I see no reason to have quotas on grades, but I imagine human nature would make Firsts a rare thing.

I think the other problem with scaling down in psychology specifically is that most of the exams are multiple choice- so there’s no way to rank students, or to mark differentially according to the strength of the essay, etc. You either get the question right, or you get it wrong- which is partly where I think scaling down becomes a problem. When you’re actually getting the questions right, but you get marks deducted for correct answers, essentially, it hardly seems fair. I know I throw the term around a lot, but I can’t really think of another word for it. You knew the material, you got the question right. I know there are problems with MC exams, but putting those aside- you can’t really rank-order exams in this case, because it’s a lot less subjective.

In terms of rank ordering things, however- what happens when in one year a student would have gotten, say, an A+, but in the next, because you get a group of students who really excel in the subject, that same person would get a B+? Or even an A? It seems incorrect in that circumstance to base the marking scheme on what the other students in the class are like. As Mark mentioned, not all sections are created equal, so why assume that they are, and that they are the same from year to year?

In terms of figuring out which test is harder- aren’t some of the questions reused? If they weren’t, then why wouldn’t be just be able to keep our exams? The probability of students remembering enough questions for long enough to write them down is fairly small, I’m sure.

In terms of rectifying the professor’s mistake- why is that often done on the actual exam that was written to be too easy? Understandable, you can create a more challenging exam the next time around, but you can’t really bring students down on the exam that was made too easy, can you?

Regarding comparing your level of performance- do grad schools then see the class average? And is class difficulty taken into account?

And I completely agree, Mark- I think that percent designations are sort of silly, especially when they’re translated into GPA calculations on a 4.0 scale, where a 1% difference can hugely bring you down. I don’t personally see a big difference between an 89 and a 90, or an 84 and an 85, etc. I don’t understand why marks would be given out in percentages and transferred into letter grades based on the percentage when a small difference that isn’t significant will create such a big difference in letter grades, or in GPAs.

Yes, your class average is printed on your transcript.

A big issue with just letting an overly-easy exam’s marks stand is for calculating your overall average. The students who were lucky enough to be in courses where professors mistakenly gave too easy exams would have their averages unfairly inflated with respect to those students who did not take those classes.

Ideally a scaled-down exam would bin students appropriately, so the top 10% were still in the A range, only about 5% failed, and most are in the B- or C+ range. It wouldn’t just subtract 5% from everyone’s grades so that nobody at all got an A and all of a sudden a ton of people fail.

Overall, I’ve come to terms with scaling as part of what I’ve signed up for. My first- and second- year science courses are good for giving general background but, if I wind up using any of this same material later on, it’ll be the more rigorous and detailed version I learn later on. I’ve more or less accepted that what I’m doing now is sitting a long series of aptitude tests designed to stratify out my year – tests that, luckily, are about stuff I like. Certainly some years or sections may be better than others, but I don’t see a good way of accurately determining that, so the fairest way to do this is to assume that any large enough group of students subject to similar admission requirements is roughly equal.

You know, normalizing grade distributions over a class population becomes more and more accurate as N increases…

Just saying.

I personally have no qualms with down-scaling. I lost faith in the grading system a couple of years back. It’s a dirty secret that there’s a pretty weak correlation between grades and ability to learn and perform, and a pretty strong (negative) correlation between grades and significant life events.

D’s likely mean you had an unlucky year–not that you’re dumb. It’s for that reason, and that first year grades are generally ignored by most of society, that I support a shift to the MIT system of putting all of first-year on pass/fail. Two terms, two grades. Spend enough time learning how University’s work and exploring your options for the next 4 years, while putting in the amount of work you deem necessary to stay afloat next year. We’d have to re-evaluation how we admit students in to specific programs, but I argue the current system is inadequate as is, for the reasons above.

I find it odd that students actually argue for scaling down- I understand the need for scaling up, or for making a subsequent exam harder, but scaling down itself I have yet to hear a good argument for.

Marks may not be a good indicator of what learning actually happens, but I don’t think that justifies lowering someone’s mark on a given exam or in a course in order to achieve some sort of magic number. Like it or not, marks are something that grad schools look at, so I think it’s important to ensure that students are treated fairly. Writing it off as something that’s just not indicative and thus shouldn’t matter sort of undermines the whole point of trying to create a system that treats the student in the fairest possible way. I think that cynicism about the whole approach is something that prevents positive change.

And I personally think that a D implies a lack of effort in most courses. From my (albeit limited) experience, the D students are the ones who don’t bother showing up to class or putting in the effort, and who then often complain about how unfair marking is. Although luck plays into it, Ds across the board for all of one’s term marks imply something other than just an unlucky year.

“And I personally think that a D implies a lack of effort in most courses”

My understanding of the research disagrees with you here Maria. That said, I haven’t gone out of my way to write a paper on the subject.

If I could redo the system overnight, grades would be made more subjective, and standardized tests would be the determiner of progression through advanced education.

The solution to the problem you’re address is not to ban practices like scaling down, but to have departments provide more input and oversight in the grading practices of their faculty to achieve fairness across courses.

Comments are closed.