Category Archives: Policy, politics

Does Evaluation Contribute to the Public Good?

In September I was honoured to give the initial keynote address at the 2017 Australasian Evaluation Society meeting in Canberra. I am thankful for the opportunity and for the warm response my keynote received.

I express my pessimism, maybe even cynicism, about the extent to which evaluation has contributed to the public good, by which I mean the well-being of all people, globally, manifested in things such as food security, healthcare, education, clean water, adequate housing. I offered some hopeful suggestions about how evaluation as a practice might do better in its contribution to the public good.

This talk has been translated to French and has been published in La Vigie de l’évaluation and can be accessed here. It will soon be published in English and I will post a link here soon.

I also appreciate the media coverage this talk received in the Mandarin, an independent online newspaper devoted to government policy and practice in Australia. Click here for a link to that story, “Whoever Heard of an Independent Evaluation Keynote Tell It Like It Is?”

evaluation and independence… it matters

In the run up to the Academy Awards I was catching up on nominated movies. This past weekend I saw several including The Big Short. A Salon review summarizes the all to familiar story of the movie:

In the late 1990s, banks and private mortgage lenders began pushing subprime mortgages, many with “adjustable” rates that jumped sharply after a few years. These risky loans comprised 8.6 percent of all mortgages in 2001, soaring to 20.1 percent by 2006. That year alone, 10 lenders accounted for 56 percent of all subprime loans, totaling $362 billion. As the film explains, these loans were a ticking time bomb, waiting to explode.

While there is really nothing new revealed in the movie, there is a great scene in which Mark Baum (Steve Carrell) confronts the Standard and Poor’s staffer who admits to giving high ratings to mortgage security bonds because the banks pay for the ratings. If S&P doesn’t deliver the high ratings, the banks will take their business elsewhere, perhaps to Moody’s. The profit incentive to be uncritical, to not evaluate, is overwhelming. Without admitting any wrong doing it has taken until 2015 for S&P (whose parent company is MacGraw Hill Financials) to make reparations in a $1.4M settlement with the US Justice Department.

This is a particular and poignant message for evaluation and evaluators. Like so much else about the financial crisis, shortsightedness and greed resulted in false evaluations, ones with very serious consequences. S&P lied: they claimed to be making independent evaluations of the value of mortgage backed securities, and the lie meant making a larger than usual profit and facilitating banks’ bogus instruments. Moody did the same thing. While the ratings agencies have made some minor changes in their evaluation procedures the key features, lack of independence and the interconnection of their profit margin with that of their customers, have not. The consensus seems to be there is nothing that would preclude the evaluators from playing precisely the same role in the future.

In addition, while the ratings companies profits took a serious hit the big three agencies—Moody’s, S&P and Fitch— their revenues surpassed pre-crisis levels, and Moody’s and S&P now look more attractive as businesses than most other financial firms do. Something worth pondering another day.

Individual evaluators may say, “Well, I wouldn’t do that” and that may be to some extent true, but the same underlying relationships are repeated in all contracted evaluation work. If you are hiring me to do evaluation for you and I want you to consider hiring me again in the future then I am in the same relationship as the ratings agencies are to financial institutions. This is a structural deficiency, and a serious one. In a soon to be published book chapter (in Evaluation for an Equitable Society), I analyze how capitalism has overwhelmed pretty much everything. We are unable to see a role for evaluation theory and practice outside the fee-for-service framework dictated in the current neoliberal frames of social engagement.

In that chapter I offer suggestions about what evaluation can do, alongside being more responsible within a fee for service framework. First, evaluation needs to evaluate its own systems and instruments. Meta-analysis of evaluations (like that done by S&P, and pharmaceutical companies, by grant funding agencies, in education, and so on) are necessary. Using our skills to insure that what is being done in the name of evaluation is indeed evaluative and not merely profiteering is critically important. Second, professional evaluation associations need to promote structures for truly independent evaluations, evaluations solicited and paid for by third parties that have no profit to make although, of course, an interest (government agencies, funding agencies, and so on) in competently done, valid evaluation studies.

Participation in Humanitarian Evaluation

Chris Morris guest blogged on the Impact Ready blog about some findings from his research on the role of evaluation in accountability in interventions and programs in crisis effected communities. He focuses in this blog post specifically on the lack of participation by local communities in evaluations that are meant to provide accountability to those most effected. Click here to read the whole post.

the difference between external and independent evaluation

The terms external and independent evaluation are often conflated, largely because external evaluations are (wrongly) assumed to be more independent than internal evaluations. A good example is the evaluation of the LAUSD iPad initiative conducted by the American Institutes for Research, which is described in an EdWeek story like this:

An independent evaluation of the Los Angeles Unified School District’s ambitious—and much-maligned—effort to provide digital devices to all students found that the new, multi-million dollar digital curriculum purchased as part of the initiative was seldom used last year because it had gaping holes, was seen by some teachers to lack rigor, and was plagued by technical glitches.

To be fair, AIR calls their evaluation external, NOT independent. And the evaluation IS external because the evaluators (AIR staff) are not members of the organization (LAUSD) in which the evaluand exists. They are external also to the iPad initiative, the program itself.

While a bit pedestrian, it is worth asking what is meant by independent so it is clearer how external and independent are not synonyms.

Using the LAUSD iPad example, is AIR’s evaluation independent? The first sense of independence would suggest the evaluation is free from control by any one outside of AIR and the AIR evaluation team ~ that the evaluation is not influenced by any one, including the LAUSD, Pearson or Apple. It is clear from the report that indeed the evaluation is influenced by the LAUSD by asking questions that are relevant and desirable to them, although there is no obvious influence from Pearson or Apple, the two corporations providing the hardware, software, and professional development. This is absolutely typical in evaluation ~ those who commission the evaluation influence the focus of the evaluation, and often how the evaluation is done (although whether that was the case in this evaluation is not explicit in the report).

A key to the influence the LAUSD has on the evaluation is illustrated in the description of the program milestones, the first of which is characterized as awarding the contract to Apple in June 2013. But it is clear this is not the first milestone as a LAUSD Board report released in August 2014 points to Superintendent Deasy’s manipulation of the bidding process so it would be a foregone conclusion the successful vendor would be the Apple/Pearson combo. AIR evaluators would have known about this. There is also no mention of the LAUSD’s refusal, when the project was rolled out, to reveal how much money had been paid to Pearson, a subcontractor to Apple on the $30 million first phase of the project.

Evaluators might argue that these matters are not the focus of the evaluation as framed by the evaluation questions, and that is likely true. The problem is that the evaluation questions are usually (and no reason to believe this wasn’t the case with the AIR evaluation of the iPad initiative) mutually agreed upon by the external evaluator and the organization contracting for the evaluation. That an organization would not want to include issues of malfeasance, transparency and accountability is understandable in many cases. A truly independent evaluation would necessarily include these issues, as well as other unanticipated circumstances and outcomes. The lack of independence is structural (in who commissions evaluations) privileging the perspectives of decision-makers, funders and CEOs.

The second sense of independence points to a failure for every external evaluation ~ external evaluators are in an immediate sense dependent on whomever commissions the evaluation for their subsistence and in the longer term sense if they wish to do evaluations for this organization again, or even other organizations who may monitor how the first sense of independence is treated in past evaluations. External evaluations lack financial independence.

And, external evaluations fail on the third sense of independence because the evaluators and the organizations commissioning evaluations of themselves or their programs are connected to one another, certainly financially but also often in an ongoing relationship with one another.

Whose interests are served and how?

Because of the lack of structural and financial independence, external evaluations (as much as internal evaluations) emphasize some interests and serve some ends, while ignoring or bracketing others. In the LAUSD iPad initiative, the interests of both the LAUSD as a whole, the Board, and John Deasy are served both by what is included and excluded. The AIR evaluation provides a good descriptive account of the roll out of a major technology initiative, including issues with levels and types of use, quality of curriculum, and what worked well (the use of apps, for example). The evaluation could not be construed as positive on the Pearson curriculum content.

But by avoiding the inclusion of issues around the initial bidding process, so too are specific interests of Deasy, Apple and Pearson served. What does it mean that both Deasy and Apple were involved in manipulating the bidding for the contract? Put in the context of Apple’s aggressive marketing of iPads to schools, this becomes potentially an example of profit-making over learning. Apple’s last quarterly earnings claims more than 13 million iPads have been sold globally for education; 2 and a half iPads are sold for every Mac in K-12 education. The secretive partnering with Pearson, a company recognized more for making profit than making educational gains, should be an additional piece of an independent evaluation. Corporations whose primary interest is profit making and who mastermind programs and products deserve scrutiny for how their interests intersect with other interests (like teaching and learning).

Although there are few mechanisms for truly independent evaluations, professional evaluation associations and professional evaluators should be pondering how their work as either internal or external evaluators might be more independent, as well as developing strategies for conducting truly independent evaluations that are simply not compromised by the structural and financial relationships that characterize virtually all evaluations.

Olympic judges shouldn’t cheat, neither should evaluators

This is a pretty easy take away message, and figure skating is not surprisingly the sport to deliver this one. Figure skating might be one of the roughest non-contact sports there is. Cheating by judges and skaters attacking other skaters off the ice are legendary. Cheating in judging scandals have resulted in a revised evaluation system that most would suggest isn’t much of an improvement (more about that in another post). To say that judging in figure skating has credibility problems is an understatement.

So, it’s not surprising (even if it isn’t true) that as the competition begins there are rumors that the Russian and US judges are colluding to squeeze Canada out of any medals. As reported in the Globe and Mail, “The allegation implies the pact would see the U.S. judge dish out favourable marks to Russia in the team event, where the U.S. is not a contender for the podium, in exchange for the Russian judge boosting the scores for Americans Meryl Davis and Charlie White in the ice dance.” This sort of collusion harkens back to the 2002 Salt Lake City Olympics where the Canadian pairs team lost the gold to Russia, and the French judge Marie-Reine Le Gougne eventually revealed she was pressured by the French with the influence of a Russian mobster to award the Russians high marks, in exchange for similar treatment for France’s ice dance team. (For a quick summary, click here.) So yeah, rumour or truth, the fact that it’s happened before lends just a little weight to the “current” collusion accusations.

Most evaluators aren’t in the position to collude in quite the same way as these Machiavellian figure skating judges, but the advice ~ do not cheat still holds. The cheating might take on a different form… like designing an evaluation you know will make the evaluand look to be a failure. The best (meaning most egregious and obvious) example of this that comes to mind is Charles Murray’s evaluation of PUSHExcel in the 1980s. Designing an evaluation that some have contended was inappropriate and doomed the program before the evaluation began, is cheating. Rigging the evaluation through a priori manipulation of the means for judging, whether in figure skating or program evaluation just isn’t what we should do!

a blog post about whether I should be blogging…

The International Studies Association (political science folks) is discussing a proposal to ban Association journal editors, editorial board members and anyone associated with its journals from blogging. Here is the language:

“No editor of any ISA journal or member of any editorial team of an ISA journal can create or actively manage a blog unless it is an official blog of the editor’s journal or the editorial team’s journal,” the proposal reads. “This policy requires that all editors and members of editorial teams to apply this aspect of the Code of Conduct to their ISA journal commitments. All editorial members, both the Editor in Chief(s) and the board of editors/editorial teams, should maintain a complete separation of their journal responsibilities and their blog associations.”

Singling out blogs, but no other social media or letters to the editor or op eds, the ISA asserts that blogging is some how unseemly, that it is a kind of discourse that is not proper professional behavior, and that if one blogs one is likely to sink into some abyss losing a grasp on one’s dignity and respectability.

At best this proposal is quaint, a desire for a past when professors stayed in their offices and wrote for and engaged with their peers through narrow publication channels (like the ISA journals). At worst, this is a draconian effort to challenge academic freedom, to squelch professors’ engagement in public life, and to control access to knowledge. The silliness of this proposal does little to obviate its threat to civic engagement of scholars, both the activist minded and those who understand the world is bigger than the university campus.

The new symbolic evaluation

Recently I bought two new cars, and have been on a vacation where I stayed in numerous hotels, flew various airlines, and took a cruise. In almost every instance I was asked to do an evaluation of the product and/or service. As an evaluator, it is too easy to assume that requests for evaluation are genuine, and companies want feedback to improve their product/service. I’ve come to understand these requests are primarily symbolic.

First, the cars. Buying an expensive car, I was told that the company would send an evaluation form, but was also asked to give all 5’s on the five-point likert-scale since the company considered anything less than that a failure. My salesperson, who was very good, told me he would be investigated if he received anything less than a perfect score. I thought he did a really good job; I was happy. But everything wasn’t perfect and in a true evaluation I would have likely made some suggestions for how the experience could have been improved for me. I didn’t though–I gave the experience a perfect score.

UPDATE on this product:
Recently this car required servicing and I experienced pretty poor customer service. Answering the customer service satisfaction survey honestly unleashed a stream of email and telephone responses from the dealer.

Since December 11, 2010 you have been the owner of a 2011 BMW 550i purchased through XXX BMW and it seems that we have certainly missed the mark with your most recent service experience. We received a score from you of 78.6%. Our goal is 100% and we as a business are very interested in what we can do to get there.

If you don’t mind giving me a quick call it would be very appreciated so that I can express our expectations of our staff. Us not keeping you well informed is not the XXX BMW way and unacceptable by our standards. If you could possibly email me or call me in the future before filling out these surveys if anything was not 100% it would be appreciated as well. Our goal is to exceed your expectations with every interaction.

Looking forward to hearing from you so that we can turn your ownership experience around.

I received this email and a phone message, even though I indicated on the survey I did not wish to be contacted… I considered my responses clear and straightforward. But as the email above indicates, clear and straightforward feedback was less the issue than getting me right with how these surveys ‘ought’ to be done. So, even though I had nothing else to say, I emailed the customer service agent back, and received yet another email back. The evaluative process began to feel like harassment.

Buying a modest priced car, I also received an evaluation form. This one I completed honestly, and there were a number of issues (none fatal flaws) that would have improved the experience for me. No one at this car company told me I should give perfect scores, but the results of my honest evaluation might suggest they should have. A few days after completing the evaluation of this car buying experience I received a phone call from the person who handles the financial part of buying the car. I was driving in my new car at the time, and loved that I could answer the call directly from my car–that’s cool! The person calling me began chastising me for the responses I gave on the evaluation, demanding to know more about why I wasn’t perfectly satisfied with the experience. The fact of the matter is that this person was pushy about purchasing extras like replacement insurance and extra warranties–I thought it a hard sell and wondered if there were commissions involved. This is the feedback I provided. I reiterated to the finance person how I felt about the experience, she continued to harangue me, eventually reaching a point of yelling at me. At this point, I terminated the call, which I could conveniently do by pressing a button on the dashboard screen in my car! Formative feedback was not what this person or car company wanted.

My experience after a month of travel that involved planes, buses, ships, and hotels was pretty similar to the car experiences. An invitation to evaluate services/products wasn’t entirely genuine, but it was important for these companies to look as if they cared about customer satisfaction. I admit I likely take evaluation more seriously than your average person, but still I am impressed by the integration of largely symbolic evaluation in these corporate cultures. To evaluate has become standard operating procedures, but the practice has not matured to using the findings of those evaluations in truly formative and summative ways.

Should we give grades in school?

A few school districts in western Canada have moved away from percentage grades to categorical grades and involving students and parents genuinely in conferences about learning. In BC, Maple Ridge and Pitt Meadows school district has replaced letter grades with what they are calling a student-inclusive conferencing model. Battle Creek school district in Alberta has replaced percentage grades with a categorical grading of beginning, developing, achieving, or excelling. This change was implemented some time ago for elementary and junior high schools, and is now being extended to the high school. In both cases, participating in the new grading systems is optional for teachers. The change, in both cases, has been controversial… yay-sayers and nay-sayers abound. In AB there have been parent and student protests.

Today, I was on CBC Radio 1, The 180 with Jim Brown, debating the use of grades in school ~ Michael Zwaagstra, who is affiliated with the neo-liberal Frontier Center for Public Policy, representing the “we can’t live without percentage grades position” and I representing the “schools would be better places without grades position.”

CBC’s mini online poll shows about equal numbers of voters to be for percentage grades and for other approaches to grading.

Click here to hear the show (the interview/debate happens in the second half hour).

Reflections of a Journal Editor

When my term as Editor-in-Chief of New Directions for Evaluation ended I was asked to write a short piece for the AEA newsletter, as I did each year whilst I was EIC. I submitted a short reflection on knowledge and publishing rather than a summary of what was in and what would be in NDE. I have been told by Gwen Newman of AEA that the short piece I wrote would be published in the AEA Newsletter, but three months have passed and it hasn’t appeared. I have no insight about why.

Below is the short reflective commentary I wrote.

As of December 2012 my term as Editor-in-Chief of New Directions for Evaluation ended, and Paul Brandon’s term began. AEA has made a fine choice in appointing Paul, and I wish him good luck in his new role.

Closing the book on six years working on NDE leads me to reflect on being an editor and the role of scholarly journals. I have enjoyed being the editor of NDE, I hope I have made a positive contribution to AEA, and I have tried to respect the diversity of viewpoints and varying degrees of cultural competence in the journal publishing game. I have enjoyed working with the newer generation of evaluators and those whose voices might not otherwise have been heard, but regret that this did not make up more of my time as NDE editor. I also have mixed feelings, even if, on balance, the good outweighs the bad.

Journal editors are gatekeepers, mediators, maybe even definers of the field, who are expected to oversee and insure the fairness of an adjudication process that results in the stamp of approval and dissemination of knowledge that is most worthy and relevant to the field. But in fulfilling this role, journal editors participate in a larger ‘game’ of knowledge production. Of course, others participate in the game as well, including authors, the reward systems in higher education, professional associations, publishing companies, and indeed journal readers. Pierre Bourdieu’s notion of “illusio” captures the ‘game’ of publishing in scholarly journals, a game where everyone must play, and even be taken in by the game, in order for the game to continue.

And so I have played a key role in this game, a game that is mostly seen as necessary, benign, civil and collegial. I am, however, a bit disquieted by my complicity in the game, where knowledge about evaluation theory and practice is commodified, packaged and embargoed. A game that sometimes defines too narrowly what ought to be published, in what form, by whom, and limits access to knowledge. The illusio of the game leads us to believe that without stalwart gatekeepers and limited (often corporately owned) venues for sharing knowledge there will be excessive scholarly writing, and that it will be of dubious quality. There is little evidence to support this fear, and a growing number of highly regarded open access journals, blogs, and websites that do not forsake quality and suggest the possibility of a new game.

In a vision of the future where knowledge is a public commodity and freely shared, I imagine journal editors might play a different role in the game. A role that focuses less on gatekeeping and more on opening the gate to welcome the sharing of evaluation knowledge for free, with unfettered access, and without the need for authors to give away copyright to their works. While it may be the case that knowledge in some disciplines has a small, select audience, evaluation knowledge crosses all domains of human experience with an attendant desire to foster improvement. The audience for our work is vast, and I wish for thoughtful inclusive sharing of evaluation knowledge.

If your job involves doing evaluation (and whose doesn’t), you might be sued

For many professionals doing evaluation is part of the job. Lawyers make evaluative judgements about the quality of evidence; teachers judge the quality of students’ learning; builders judge the quality of materials. All work entails judgements of quality, and the quality of work is dependent on doing good evaluations.

But what happens when the evaluation done as part of professional work is contested? You might just find yourself being sued. Such is the case with Dale Askey, librarian at McMaster University. Askey’s job requires him to make judgements about the quality of published works and in turn publishers to make wise procurement decisions for his employer, decisions that have become ever more difficult with shrinking resources. The case can be easily summarized:

Librarian questions quality of a publishing house.

Librarian publicly criticizes said press on his personal blog.

Two years later, librarian and current employer get sued for libel and damages in excess of $4 million.

Read more: http://www.insidehighered.com/news/2013/02/08/academic-press-sues-librarian-raising-issues-academic-freedom#ixzz2MDEYx2An
Inside Higher Ed

There is no reason to believe that Askey rendered his judgement about the quality of scholarship offered by Mellen Press in a capricious or incompetent manner. Making judgements for procurement decisions is surely one of the tasks that Askey’s employer expects him to do, especially in a time of diminishing resources.

There has been considerable support for Askey, some a bit misguided by defending his write to express his opinion on his blog, but most in defense of Askey’s responsibility to do his job.

There is every reason to expect that the Mellen Press lawsuit will be dismissed as was the similar lawsuit brought by Mellen Press against Lingua Franca.

So what is the relevance for evaluation? It is clear that evaluation is integral to all and applied in virtually all other intellectual and practical domains… it is as Michael Scriven claims, a trans-discipline. As such, there is a need to pay more attention in preparing people to do publicly defensible evaluations in the context of their work. Perhaps more than program evaluation, this sort of evaluative thinking might be the raison d’etre for the discipline of evaluation.