Category Archives: Educational evaluation

student evaluations of teaching ~ a fundamental problem for higher education

51c8ba4f51e80.imageStudent evaluations of teaching (SET) are standard fare in higher education where there is often relatively little information available to judge what is supposed to be a major part of professors’ work. There are often long end of course surveys students complete, covering everything from whether the content was relevant to the professor’s competence to whether students felt fairly treated. Putting students in a role as observers of what happens in classes isn’t a bad idea, and I’ve written about the Measures of Effective Teaching Project here. Not everything that is asked of students is clearly in their realm though, such as whether the professor knows his/her stuff, and much more care should be given to asking students to observe what is observable and experienced specifically by them. This matters when the data from these surveys are used to make career decisions about professors. The overall evaluation process for tenure, promotion, merit pay, and continuing appointment involve more components than teaching and indeed scholarship (publishing) is often compensatory and can nullify a poor teaching evaluation. But that’s another post, for another day.

A recent study at NC State points to a specific, and serious, problem with student evals of teaching: they are gender biased. One of the basic issues in instrument construction is validity and it would appear student evaluations of teaching aren’t living up to this expectation.

“The ratings that students give instructors are really important, because they’re used to guide higher education decisions related to hiring, promotions and tenure,” says Lillian MacNell, lead author of a paper on the work and a Ph.D. student in sociology at NC State. “And if the results of these evaluations are inherently biased against women, we need to find ways to address that problem.”

Benjamin Schmitt, an assistant professor of history at Northeastern University, who is interested in data visualization has given us what might be the most useful data to see the effects of gender bias. He used Rate My Professor as the data source describing his analysis here, and before you get all uppity about this there is at least some evidence of a reasonably high correlation between ratings on RMP and institutional SETs. Schmitt has created an interactive chart to explore gender, discipline and student ratings. It’s pretty amazing… and eye-opening. I’ve clipped just a few examples when different terms are used to search the data base. Apparently male professors in psychology might have futures as stand up comedians, all education professors are so caring any of them might be the next Mother Teresa, and all philosophy profs are brilliant but men more so than women. Not all descriptors are gender biased… attractive, for example, leads us to believe education professors are the least attractive (ouch!) while the hotties are female criminal justice and male philosophy profs… but a surprisingly consistent pattern of gender bias against women across disciplines is apparent.

search for FUNNY

search for FUNNY

using CARING

using CARING

BRILLIANT as the search term

BRILLIANT as the search term

 

Explore this interactive chart and be amazed, even appalled. You’ll find that men are more likely to be seen as a star, knowledgeable or awesome and women are more likely to be seen as bossy, disorganized, annoying and to play favourites, whilst being either nicer or ruder than men. (Wait, is that rudely nice OR nicely rude?)

But then ask if the student ratings have any basis in fact. Are philosophy professors really more brilliant than professors in accounting, business or health sciences? Do women across all disciplines lack a sense of humour? In other words, the problem with these ratings are complex and layered… gender bias, for sure; truth value, disputable. And yet, faculty committees and higher ed administrations persist in using such flawed data or at least continue to symbolically offer it up as grounds for important personnel decision making.

alternatives to standardized testing

67547_10151545229449501_624032542_nIn educational evaluation the global educational reform movement (GERM) has privileged common indicators of student learning outcomes (used in turn for other evaluation purposes like teacher evaluation, even if not a sound practice). There are many reasons why standardized tests become the norm and are reified as the only fair and legitimate way to know how students and schools are doing. There is plenty of literature that debunks that idea.

However, the narrative of standardized testing as a necessary and legitimate means of judging the quality of learning and schooling is powerful and political. In a short commentary for NPR a reporter, Anya Kamenetz, nicely summarizes reasonable alternatives, and these are excellent talking points when confronted with the question, “If not standardized tests, what then?” You can read the article, but in summary:

1) use some sort of matrix sampling (a good idea from NAEP)

2) consider ongoing embedded assessments (this is usually computer based testing)

3) think about what you want to know and it will require multiple measures (in other words, knowing scores in a few subject areas will never be enough, and maybe there are things worth knowing beyond the obvious)

4) start considering novel approaches to assessment, like game based assessment and the not so novel use of portfolios or narrative evaluations

5) think differently about what it means to judge a school and that means looking at more than just narrow student outcomes (school inspections are a notion worth revisiting).

Reporting evaluation results ~ the case of school report cards

Report cards are a particular example of reporting evaluation results ~ telling students and their parents how well they are doing at what they are supposed to be learning. Typically, report cards are done three times in a school year and this time of year is about when that first report card comes home. I did a Q & A about report cards, and had this to say:

A UBC education expert explains why rewarding good grades is an automatic fail

Whether it’s straight As or straight Cs, UBC education professor Sandra Mathison says parents should ask plenty of questions about a student’s report card, and cautions against rewarding good grades.

How useful are report cards?

Prof. Sandra Mathison

There are different perceptions about what types of report cards are useful. What educators see as being appropriate is often not a view shared by parents. This is a fundamental contradiction.

Parents see marks within a framework of competition. Knowing that your child is getting an A means they’re in the top group, and if they get a C, then your child is doing a lot worse than some classmates.

For educators, the evolution of report cards is about providing information that is substantive and communicates, more specifically, the knowledge and skills that students are acquiring and demonstrating. That requires more detail. For example, educators can break “reading skills” down into phonemic awareness, comprehension, vocabulary, and so on.

Having detailed information communicates more clearly what kids know, and the move in most report cards is towards that. But parents often push back and say, “That’s too much information. I just want to know if my kid is doing OK.”

Are letter grades helpful?

Most of the time letter grades represent big categories and are therefore devoid of much meaning. If a high-school student gets a B in chemistry, what does that mean? Chemistry is complex. It has many components. Does the student understand the concepts, but struggles with those ideas in an applied laboratory context? Does the student understand the ideas but has a difficult time with nomenclature?

How should parents react to a less than stellar report card?

It’s more appropriate to ask what sense parents should make of report cards in general. A parent whose child gets all As should be asking questions, just like a parent whose child gets all Cs or Ds or Fs. Simply accepting some general level of performance, whether it’s very good or not, suggests you don’t actually want to know anything detailed about what your child is learning.

All parents, regardless of the letter grade, should say, “Help me understand in more detail what my child knows and can do, and what my child doesn’t seem to know and is having trouble doing.”

Ask the teacher to be analytic about the child’s knowledge and performance, and to show the parent work the student has done that shows something they’re struggling with. Then the parent can see and ask, “What can you and I do in order to help this child?”

Is it helpful to reward good grades with money or gifts?

It’s highly problematic. From an educational research perspective, we foster kids learning things because they have intrinsic value, not because they have rewards attached.

Parents and teachers should foster children’s interest in learning for the sake of learning, not the pursuit of grades or external rewards. When you give cash or gifts to students, you’re saying to them that what matters most are the outcomes. And when you put the emphasis on simple outcomes, such as getting an A or a particular score on a standardized test, it becomes a defining feature of learning. The quality of student thinking and their interest in the content is diminished when they focus on getting grades for gifts.

This short piece generated more media attention than I anticipated, perhaps because many of those in the media are themselves parents facing their children’s report cards.

A follow-up to this short piece was my segment on Stephen Quinn’s CBC On the Coast. Listen here.

the difference between external and independent evaluation

The terms external and independent evaluation are often conflated, largely because external evaluations are (wrongly) assumed to be more independent than internal evaluations. A good example is the evaluation of the LAUSD iPad initiative conducted by the American Institutes for Research, which is described in an EdWeek story like this:

An independent evaluation of the Los Angeles Unified School District’s ambitious—and much-maligned—effort to provide digital devices to all students found that the new, multi-million dollar digital curriculum purchased as part of the initiative was seldom used last year because it had gaping holes, was seen by some teachers to lack rigor, and was plagued by technical glitches.

To be fair, AIR calls their evaluation external, NOT independent. And the evaluation IS external because the evaluators (AIR staff) are not members of the organization (LAUSD) in which the evaluand exists. They are external also to the iPad initiative, the program itself.

Screen Shot 2014-09-19 at 10.15.22 AMWhile a bit pedestrian, it is worth asking what is meant by independent so it is clearer how external and independent are not synonyms.

Using the LAUSD iPad example, is AIR’s evaluation independent? The first sense of independence would suggest the evaluation is free from control by any one outside of AIR and the AIR evaluation team ~ that the evaluation is not influenced by any one, including the LAUSD, Pearson or Apple. It is clear from the report that indeed the evaluation is influenced by the LAUSD by asking questions that are relevant and desirable to them, although there is no obvious influence from Pearson or Apple, the two corporations providing the hardware, software, and professional development. This is absolutely typical in evaluation ~ those who commission the evaluation influence the focus of the evaluation, and often how the evaluation is done (although whether that was the case in this evaluation is not explicit in the report).

A key to the influence the LAUSD has on the evaluation is illustrated in the description of the program milestones, the first of which is characterized as awarding the contract to Apple in June 2013. But it is clear this is not the first milestone as a LAUSD Board report released in August 2014 points to Superintendent Deasy’s manipulation of the bidding process so it would be a foregone conclusion the successful vendor would be the Apple/Pearson combo. AIR evaluators would have known about this. There is also no mention of the LAUSD’s refusal, when the project was rolled out, to reveal how much money had been paid to Pearson, a subcontractor to Apple on the $30 million first phase of the project. 

Evaluators might argue that these matters are not the focus of the evaluation as framed by the evaluation questions, and that is likely true. The problem is that the evaluation questions are usually (and no reason to believe this wasn’t the case with the AIR evaluation of the iPad initiative) mutually agreed upon by the external evaluator and the organization contracting for the evaluation. That an organization would not want to include issues of malfeasance, transparency and accountability is understandable in many cases. A truly independent evaluation would necessarily include these issues, as well as other unanticipated circumstances and outcomes. The lack of independence is structural (in who commissions evaluations) privileging the perspectives of decision-makers, funders and CEOs.

The second sense of independence points to a failure for every external evaluation ~ external evaluators are in an immediate sense dependent on whomever commissions the evaluation for their subsistence and in the longer term sense if they wish to do evaluations for this organization again, or even other organizations who may monitor how the first sense of independence is treated in past evaluations. External evaluations lack financial independence.

And, external evaluations fail on the third sense of independence because the evaluators and the organizations commissioning evaluations of themselves or their programs are connected to one another, certainly financially but also often in an ongoing relationship with one another.

Whose interests are served and how?

Screen Shot 2014-09-19 at 11.53.22 AMBecause of the lack of structural and financial independence, external evaluations (as much as internal evaluations) emphasize some interests and serve some ends, while ignoring or bracketing others. In the LAUSD iPad initiative, the interests of both the LAUSD as a whole, the Board, and John Deasy are served both by what is included and excluded. The AIR evaluation provides a good descriptive account of the roll out of a major technology initiative, including issues with levels and types of use, quality of curriculum, and what worked well (the use of apps, for example). The evaluation could not be construed as positive on the Pearson curriculum content.

But by avoiding the inclusion of issues around the initial bidding process, so too are specific interests of Deasy, Apple and Pearson served. What does it mean that both Deasy and Apple were involved in manipulating the bidding for the contract? Put in the context of Apple’s aggressive marketing of iPads to schools, this becomes potentially an example of profit-making over learning. Apple’s last quarterly earnings claims more than 13 million iPads have been sold globally for education; 2 and a half iPads are sold for every Mac in K-12 education. The secretive partnering with Pearson, a company recognized more for making profit than making educational gains, should be an additional piece of an independent evaluation. Corporations whose primary interest is profit making and who mastermind programs and products deserve scrutiny for how their interests intersect with other interests (like teaching and learning).

Although there are few mechanisms for truly independent evaluations, professional evaluation associations and professional evaluators should be pondering how their work as either internal or external evaluators might be more independent, as well as developing strategies for conducting truly independent evaluations that are simply not compromised by the structural and financial relationships that characterize virtually all evaluations.

The “evaluate that” campaign

I am totally sympathetic with teachers’ reactions to the simplistic, pedestrian ways of evaluating the quality of their work, the quality of student work, and the quality of schools. That efforts are made to reduce complex evaluands to simple ones is a serious problem. The “EVALUATE THAT” campaign identifies important aspects of teaching and education that aren’t measured and therefore not evaluated… things like compassion, empathy, cooperation… the emotional, interactional content of the work of teaching. [Click here, for the heartfelt remarks of one teacher.] The campaign (started by BadAss Teachers who created the meme shown in this post) also suggests these things can’t be measured and can’t be evaluated. Stories are being aggregated with the use of the Twitter hastag #evaluatethat.

Whether you are a teacher, student, parent, administrator… tell us, in a brief sentence or two, YOUR moments of teaching or learning (yours or someone else’s) that was never formally measured but made an impression on you. These ‘bites’ of reality do not have to be all gloriously positive, the only criteria – true, real and not measured (no hypotheticals please).

We are collecting these via Twitter by using #evaluatethat hashtag in each relevant tweet. This will ensure all of these are kept in one place and can be easily seen by all.

The hashtag has taken on a bit of a f*&k you tone… you can sort of imagine the tweeter grabbing their crouch while they shout “EVALUATE THAT.” Even so, the collection of stories is an important reminder of the complexity of teaching and schooling… a complexity that needs to be incorporated into judgements of the quality of teaching, learning and schooling. While it may be very difficult to measure such things as compassion and empathy that’s not a reason to step away, but all the more reason to find sound ways of incorporating those behaviors and actions into evaluations.

Elliot Eisner ~ in memoriam

Elliot Eisner brought the concepts of connoisseurship and criticism from the world of art to enable new ways of thinking about educational evaluation. He died at home on January 10, 2014 and the field of evaluation has lost an important founding thinker.

In 1976, Eisner made an observation that is as true today as it was then,

First, the forms of evaluation that are now employed to assess the effectiveness of school programs have profound consequences upon the character of teaching, the content of curriculum, and the kinds of goals that schools seek to attain. Evaluation procedures, more than a reasoned philosophy of education, influence the educational priorities at work within the schools. Second, these evaluation procedures rest upon largely unexamined assumptions that are basically scientific in their epistemology, technological in their application, and have consequences that are often limited and at times inhospitable to the kinds of goals the arts can achieve.

He went on to describe how connoisseurship and criticism, concepts from the arts through which he conceptualized the artistry of teaching and schooling as a cultural artifact both of which required appreciation (through connoisseurship) and critique (through articulation of the ineffable qualities of teaching, learning and schools).

Eisner’s The Educational Imagination: On the Design and Evaluation of School Programs is a classic.

Should we give grades in school?

A few school districts in western Canada have moved away from percentage grades to categorical grades and involving students and parents genuinely in conferences about learning. In BC, Maple Ridge and Pitt Meadows school district has replaced letter grades with what they are calling a student-inclusive conferencing model. Battle Creek school district in Alberta has replaced percentage grades with a categorical grading of beginning, developing, achieving, or excelling. This change was implemented some time ago for elementary and junior high schools, and is now being extended to the high school. In both cases, participating in the new grading systems is optional for teachers. The change, in both cases, has been controversial… yay-sayers and nay-sayers abound. In AB there have been parent and student protests.

Today, I was on CBC Radio 1, The 180 with Jim Brown, debating the use of grades in school ~ Michael Zwaagstra, who is affiliated with the neo-liberal Frontier Center for Public Policy, representing the “we can’t live without percentage grades position” and I representing the “schools would be better places without grades position.”

CBC’s mini online poll shows about equal numbers of voters to be for percentage grades and for other approaches to grading.

Click here to hear the show (the interview/debate happens in the second half hour).

Holding accountability to account

One of the hallmarks of any quality evaluation is that it ought to be subject itself to evaluation. Many evaluation schemes in education, such as the test driven accountability scheme, are not evaluated. illusion_of_success_EN The Action Canada Task Force on Standardized Testing has released a report analyzing the place of standardized testing as an accountability measure in Canadian K-12 education systems, using Ontario as a case study focus. “A review of standardized testing in this province and others is not only timely – it’s urgently needed,” says Sébastien Després, a 2012-2013 Action Canada Fellow and co-author of the report.

The Task Force offers four recommendations that could be the heart of an evaluation of accountability schemes in K-12 education across Canada.

Recommendations
We recommend that the Ontario government establish a suitable panel with a balanced and diverse set of experts to conduct a follow-up review of its standardized testing program. In particular:

A. Structure of the tests relative to objectives
i. The panel should review whether the scope of the current testing system continues to facilitate achievement of education system objectives.
ii. The panel should review whether the scale and frequency of testing remains consistent with the Ministry of Education’s objectives for EQAO testing.

B. Impact of testing within the classroom
i. The panel should review the impact on learning that results from classroom time devoted to test preparation and administration.
ii. The panel should review the impact of testing methods and instruments on broader skills and knowledge acquisition.
iii. The panel should review the appropriateness and impact of the pressure exerted by standardized testing on teachers and students.

C. Validity of test results
i. The panel should review whether or not standardized testing provides an assurance that students are performing according to the standards set for them.
ii. The panel should review the impact of measuring progress by taking a limited number of samples throughout a student’s career.

D. Public reporting and use of test results
i. The panel should review the impact of the potential misinterpretation and misuse of testing results data, and methods for ensuring they are used as intended.
ii. The panel should review supplemental or alternative methods of achieving public accountability of the educational system.