Category Archives: Educational evaluation

the difference between external and independent evaluation

The terms external and independent evaluation are often conflated, largely because external evaluations are (wrongly) assumed to be more independent than internal evaluations. A good example is the evaluation of the LAUSD iPad initiative conducted by the American Institutes for Research, which is described in an EdWeek story like this:

An independent evaluation of the Los Angeles Unified School District’s ambitious—and much-maligned—effort to provide digital devices to all students found that the new, multi-million dollar digital curriculum purchased as part of the initiative was seldom used last year because it had gaping holes, was seen by some teachers to lack rigor, and was plagued by technical glitches.

To be fair, AIR calls their evaluation external, NOT independent. And the evaluation IS external because the evaluators (AIR staff) are not members of the organization (LAUSD) in which the evaluand exists. They are external also to the iPad initiative, the program itself.

Screen Shot 2014-09-19 at 10.15.22 AMWhile a bit pedestrian, it is worth asking what is meant by independent so it is clearer how external and independent are not synonyms.

Using the LAUSD iPad example, is AIR’s evaluation independent? The first sense of independence would suggest the evaluation is free from control by any one outside of AIR and the AIR evaluation team ~ that the evaluation is not influenced by any one, including the LAUSD, Pearson or Apple. It is clear from the report that indeed the evaluation is influenced by the LAUSD by asking questions that are relevant and desirable to them, although there is no obvious influence from Pearson or Apple, the two corporations providing the hardware, software, and professional development. This is absolutely typical in evaluation ~ those who commission the evaluation influence the focus of the evaluation, and often how the evaluation is done (although whether that was the case in this evaluation is not explicit in the report).

A key to the influence the LAUSD has on the evaluation is illustrated in the description of the program milestones, the first of which is characterized as awarding the contract to Apple in June 2013. But it is clear this is not the first milestone as a LAUSD Board report released in August 2014 points to Superintendent Deasy’s manipulation of the bidding process so it would be a foregone conclusion the successful vendor would be the Apple/Pearson combo. AIR evaluators would have known about this. There is also no mention of the LAUSD’s refusal, when the project was rolled out, to reveal how much money had been paid to Pearson, a subcontractor to Apple on the $30 million first phase of the project. 

Evaluators might argue that these matters are not the focus of the evaluation as framed by the evaluation questions, and that is likely true. The problem is that the evaluation questions are usually (and no reason to believe this wasn’t the case with the AIR evaluation of the iPad initiative) mutually agreed upon by the external evaluator and the organization contracting for the evaluation. That an organization would not want to include issues of malfeasance, transparency and accountability is understandable in many cases. A truly independent evaluation would necessarily include these issues, as well as other unanticipated circumstances and outcomes. The lack of independence is structural (in who commissions evaluations) privileging the perspectives of decision-makers, funders and CEOs.

The second sense of independence points to a failure for every external evaluation ~ external evaluators are in an immediate sense dependent on whomever commissions the evaluation for their subsistence and in the longer term sense if they wish to do evaluations for this organization again, or even other organizations who may monitor how the first sense of independence is treated in past evaluations. External evaluations lack financial independence.

And, external evaluations fail on the third sense of independence because the evaluators and the organizations commissioning evaluations of themselves or their programs are connected to one another, certainly financially but also often in an ongoing relationship with one another.

Whose interests are served and how?

Screen Shot 2014-09-19 at 11.53.22 AMBecause of the lack of structural and financial independence, external evaluations (as much as internal evaluations) emphasize some interests and serve some ends, while ignoring or bracketing others. In the LAUSD iPad initiative, the interests of both the LAUSD as a whole, the Board, and John Deasy are served both by what is included and excluded. The AIR evaluation provides a good descriptive account of the roll out of a major technology initiative, including issues with levels and types of use, quality of curriculum, and what worked well (the use of apps, for example). The evaluation could not be construed as positive on the Pearson curriculum content.

But by avoiding the inclusion of issues around the initial bidding process, so too are specific interests of Deasy, Apple and Pearson served. What does it mean that both Deasy and Apple were involved in manipulating the bidding for the contract? Put in the context of Apple’s aggressive marketing of iPads to schools, this becomes potentially an example of profit-making over learning. Apple’s last quarterly earnings claims more than 13 million iPads have been sold globally for education; 2 and a half iPads are sold for every Mac in K-12 education. The secretive partnering with Pearson, a company recognized more for making profit than making educational gains, should be an additional piece of an independent evaluation. Corporations whose primary interest is profit making and who mastermind programs and products deserve scrutiny for how their interests intersect with other interests (like teaching and learning).

Although there are few mechanisms for truly independent evaluations, professional evaluation associations and professional evaluators should be pondering how their work as either internal or external evaluators might be more independent, as well as developing strategies for conducting truly independent evaluations that are simply not compromised by the structural and financial relationships that characterize virtually all evaluations.

The “evaluate that” campaign

I am totally sympathetic with teachers’ reactions to the simplistic, pedestrian ways of evaluating the quality of their work, the quality of student work, and the quality of schools. That efforts are made to reduce complex evaluands to simple ones is a serious problem. The “EVALUATE THAT” campaign identifies important aspects of teaching and education that aren’t measured and therefore not evaluated… things like compassion, empathy, cooperation… the emotional, interactional content of the work of teaching. [Click here, for the heartfelt remarks of one teacher.] The campaign (started by BadAss Teachers who created the meme shown in this post) also suggests these things can’t be measured and can’t be evaluated. Stories are being aggregated with the use of the Twitter hastag #evaluatethat.

Whether you are a teacher, student, parent, administrator… tell us, in a brief sentence or two, YOUR moments of teaching or learning (yours or someone else’s) that was never formally measured but made an impression on you. These ‘bites’ of reality do not have to be all gloriously positive, the only criteria – true, real and not measured (no hypotheticals please).

We are collecting these via Twitter by using #evaluatethat hashtag in each relevant tweet. This will ensure all of these are kept in one place and can be easily seen by all.

The hashtag has taken on a bit of a f*&k you tone… you can sort of imagine the tweeter grabbing their crouch while they shout “EVALUATE THAT.” Even so, the collection of stories is an important reminder of the complexity of teaching and schooling… a complexity that needs to be incorporated into judgements of the quality of teaching, learning and schooling. While it may be very difficult to measure such things as compassion and empathy that’s not a reason to step away, but all the more reason to find sound ways of incorporating those behaviors and actions into evaluations.

Elliot Eisner ~ in memoriam

Elliot Eisner brought the concepts of connoisseurship and criticism from the world of art to enable new ways of thinking about educational evaluation. He died at home on January 10, 2014 and the field of evaluation has lost an important founding thinker.

In 1976, Eisner made an observation that is as true today as it was then,

First, the forms of evaluation that are now employed to assess the effectiveness of school programs have profound consequences upon the character of teaching, the content of curriculum, and the kinds of goals that schools seek to attain. Evaluation procedures, more than a reasoned philosophy of education, influence the educational priorities at work within the schools. Second, these evaluation procedures rest upon largely unexamined assumptions that are basically scientific in their epistemology, technological in their application, and have consequences that are often limited and at times inhospitable to the kinds of goals the arts can achieve.

He went on to describe how connoisseurship and criticism, concepts from the arts through which he conceptualized the artistry of teaching and schooling as a cultural artifact both of which required appreciation (through connoisseurship) and critique (through articulation of the ineffable qualities of teaching, learning and schools).

Eisner’s The Educational Imagination: On the Design and Evaluation of School Programs is a classic.

Should we give grades in school?

A few school districts in western Canada have moved away from percentage grades to categorical grades and involving students and parents genuinely in conferences about learning. In BC, Maple Ridge and Pitt Meadows school district has replaced letter grades with what they are calling a student-inclusive conferencing model. Battle Creek school district in Alberta has replaced percentage grades with a categorical grading of beginning, developing, achieving, or excelling. This change was implemented some time ago for elementary and junior high schools, and is now being extended to the high school. In both cases, participating in the new grading systems is optional for teachers. The change, in both cases, has been controversial… yay-sayers and nay-sayers abound. In AB there have been parent and student protests.

Today, I was on CBC Radio 1, The 180 with Jim Brown, debating the use of grades in school ~ Michael Zwaagstra, who is affiliated with the neo-liberal Frontier Center for Public Policy, representing the “we can’t live without percentage grades position” and I representing the “schools would be better places without grades position.”

CBC’s mini online poll shows about equal numbers of voters to be for percentage grades and for other approaches to grading.

Click here to hear the show (the interview/debate happens in the second half hour).

Holding accountability to account

One of the hallmarks of any quality evaluation is that it ought to be subject itself to evaluation. Many evaluation schemes in education, such as the test driven accountability scheme, are not evaluated. illusion_of_success_EN The Action Canada Task Force on Standardized Testing has released a report analyzing the place of standardized testing as an accountability measure in Canadian K-12 education systems, using Ontario as a case study focus. “A review of standardized testing in this province and others is not only timely – it’s urgently needed,” says Sébastien Després, a 2012-2013 Action Canada Fellow and co-author of the report.

The Task Force offers four recommendations that could be the heart of an evaluation of accountability schemes in K-12 education across Canada.

Recommendations
We recommend that the Ontario government establish a suitable panel with a balanced and diverse set of experts to conduct a follow-up review of its standardized testing program. In particular:

A. Structure of the tests relative to objectives
i. The panel should review whether the scope of the current testing system continues to facilitate achievement of education system objectives.
ii. The panel should review whether the scale and frequency of testing remains consistent with the Ministry of Education’s objectives for EQAO testing.

B. Impact of testing within the classroom
i. The panel should review the impact on learning that results from classroom time devoted to test preparation and administration.
ii. The panel should review the impact of testing methods and instruments on broader skills and knowledge acquisition.
iii. The panel should review the appropriateness and impact of the pressure exerted by standardized testing on teachers and students.

C. Validity of test results
i. The panel should review whether or not standardized testing provides an assurance that students are performing according to the standards set for them.
ii. The panel should review the impact of measuring progress by taking a limited number of samples throughout a student’s career.

D. Public reporting and use of test results
i. The panel should review the impact of the potential misinterpretation and misuse of testing results data, and methods for ensuring they are used as intended.
ii. The panel should review supplemental or alternative methods of achieving public accountability of the educational system.

Evaluating schools

If nothing else, we have learned a great deal about what doesn’t work in terms of evaluating schools. The global penchant for using a few outcomes measures just doesn’t do the trick… this is perhaps most obvious in the USA where judging the quality of schools continues to spiral downward from NCLB to Race to the Top, but around the world we see a similar story. And, we see a few counterpoints, such as the success of the Finnish school system where the focus is decidedly not on standardized outcomes on a few measures.

In British Columbia, Canada where education is a decided provincial matter and where provincial politics can actually lead to quite radical shifts in policies and programmatic initiatives, this is a moment of potential change. BC schools have been for many years now held hostage by the scores on the Foundation Skills Assessment (FSA), a test given to all 4th and 7th grade students in the province. Support for the FSA has been eroding over the past several years with a chorus of skeptical teacher, school administrator and school trustee voices.

One initiative, The Great Schools Project, has been developing alternative ideas about school evaluation. The website gives s sense of the GSP platform and a bit more information about the issues can be heard in a segment of a local radio talk show.

Students evaluate teachers

While the strategy of students evaluating professors is common in higher education, this approach is rare in K-12 education. One component of the Measures of Effective Teaching Project at Harvard is just such data. Based on a a decade old survey developed by Ronald Ferguson, an economist at Harvard, a shorter survey had been developed that asks students to describe their classroom instructional climate. Importantly, students (all the way from Kindergarten through high school) are not asked to judge their teachers, but to provide a description of what the classroom environment looks and feels like to them. The survey includes the following kinds of questions:

 Caring about students (Encouragement and Support)
o Example: The teacher in this class encourages me to do my best.”
• Captivating students (Learning Seems Interesting and Relevant)
o Example: “This class keeps my attention – I don’t get bored.”
• Conferring with students (Students Sense their Ideas are Respected)
o Example: “My teacher gives us time to explain our ideas.”
• Controlling behavior (Culture of Cooperation and Peer Support)
o Example: “Our class stays busy and doesn’t waste time.”
• Clarifying lessons (Success Seems Feasible)
o Example: “When I am confused, my teacher knows how to help me understand.”
• Challenging students (Press for Effort, Perseverance and Rigor)
o Example: “My teacher wants us to use our thinking skills, not just memorize things.”
• Consolidating knowledge (Ideas get Connected and Integrated)
o Example: “My teacher takes the time to summarize what we learn each day.”

A recent story in The Atlantic the results are summarized.

The most refreshing aspect of Ferguson’s survey might be that the results don’t change dramatically depending on students’ race or income… But overall, even in very diverse classes, kids tend to agree about what they see happening day after day.

Whether these data should be used in teacher evaluation requires careful consideration, but from a larger evaluative perspective what this demonstrates is the very valuable data that those who are meant to benefit most from programs and interventions can provide. If you ask the right questions, and if you respect their experiences and perspectives.

Purpose of evaluation

This is a pre-publication version of an entry in the International Encyclopedia of Education, 3rd Edition. Please note the correct citation in the text and refer to the final version in the print version of the IEE.

Mathison, S. (2010). The purpose of evaluation. In P. Peterson, B. McGaw & E. Baker (Eds.). The International Encyclopedia of Education, 3rd ed. Elsevier Publishers.

ABSTRACT
There are two primary purposes of evaluation in education: accountability and amelioration. Both purposes operate at multiple levels in education from individual learning to bounded, focused interventions to whole organizations, such as schools or colleges. Accountability is based primarily on summative evaluations, that is, evaluations of fully formed evaluands and are often used for making selection and resource allocation decisions. Amelioration is based primarily on formative evaluation, that is, evaluations of plans or developing evaluands and are used to facilitate planning and improvement. Socio-political forces influence the purpose of evaluation.

Purpose of evaluation