Category Archives: Evaluation theory

Evaluators cannot be useful if their only skill is data generation

The New York Times columnist David Brooks nicely captures the problem that Carol Weiss identified several decades ago… data doesn’t speak authoritatively, nor should it. In evaluation and in decision making we take into account loads of data, available to us in greater amounts and more sophisticated ways, but it still is never enough on its own.

Brooks highlights the limitations of what he calls “big data.”

Data struggles with the social. Your brain is pretty bad at math (quick, what’s the square root of 437), but it’s excellent at social cognition. People are really good at mirroring each other’s emotional states, at detecting uncooperative behavior and at assigning value to things through emotion.

Computer-driven data analysis, on the other hand, excels at measuring the quantity of social interactions but not the quality. Network scientists can map your interactions with the six co-workers you see during 76 percent of your days, but they can’t capture your devotion to the childhood friends you see twice a year, let alone Dante’s love for Beatrice, whom he met twice.

Therefore, when making decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk.

Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. The human brain has evolved to account for this reality. People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.

Data creates bigger haystacks. This is a point Nassim Taleb, the author of “Antifragile,” has made. As we acquire more data, we have the ability to find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Falsity grows exponentially the more data we collect. The haystack gets bigger, but the needle we are looking for is still buried deep inside.

One of the features of the era of big data is the number of “significant” findings that don’t replicate the expansion, as Nate Silver would say, of noise to signal.

Big data has trouble with big problems. If you are trying to figure out which e-mail produces the most campaign contributions, you can do a randomized control experiment. But let’s say you are trying to stimulate an economy in a recession. You don’t have an alternate society to use as a control group. For example, we’ve had huge debates over the best economic stimulus, with mountains of data, and as far as I know not a single major player in this debate has been persuaded by data to switch sides.

Data favors memes over masterpieces. Data analysis can detect when large numbers of people take an instant liking to some cultural product. But many important (and profitable) products are hated initially because they are unfamiliar.

Data obscures values. I recently saw an academic book with the excellent title, “ ‘Raw Data’ Is an Oxymoron.” One of the points was that data is never raw; it’s always structured according to somebody’s predispositions and values. The end result looks disinterested, but, in reality, there are value choices all the way through, from construction to interpretation.

Purpose of evaluation

This is a pre-publication version of an entry in the International Encyclopedia of Education, 3rd Edition. Please note the correct citation in the text and refer to the final version in the print version of the IEE.

Mathison, S. (2010). The purpose of evaluation. In P. Peterson, B. McGaw & E. Baker (Eds.). The International Encyclopedia of Education, 3rd ed. Elsevier Publishers.

There are two primary purposes of evaluation in education: accountability and amelioration. Both purposes operate at multiple levels in education from individual learning to bounded, focused interventions to whole organizations, such as schools or colleges. Accountability is based primarily on summative evaluations, that is, evaluations of fully formed evaluands and are often used for making selection and resource allocation decisions. Amelioration is based primarily on formative evaluation, that is, evaluations of plans or developing evaluands and are used to facilitate planning and improvement. Socio-political forces influence the purpose of evaluation.

Purpose of evaluation

Evaluation is all around

Beginning a new semester I strive to find ways to illustrate that evaluative thinking is all around, and indeed may by the most ubiquitous form of thinking that we as humans do. In the words of The Troggs…

I feel it in my fingers
I feel it in my toes
[Evaluation] is all around me
And so the feeling grows

There are plenty of examples like Consumer Reports or Rotten Tomatoes or The Wine Spectator. But there is nothing like the many TV contest shows for entertaining examples of evaluation. This term my class watched the show Four Weddings and analyzed how the evaluation was done. It is quite a rich example that illustrates the logic of evaluation including such things as:

    > the complexity of establishing criteria, including the need for far more specificity than one might think
    > relative weighting of criteria
    > the slippery and amorphous nature of standards
    > how grading and ranking differ, but can be combined in a single evaluation
    > what the attributes of a good evaluator might be

It’s written on the wind
It’s everywhere I go, oh yes, it is
So if you really [evaluate]
Come on and let it show

AEA’s Values – Walking the Talk with Sandra Mathison

AEA’s Values – Walking the Talk with Sandra Mathison, NDE Editor

AEA’s Values Statement
The American Evaluation Association values excellence in evaluation practice, utilization of evaluation findings, and inclusion and diversity in the evaluation community.

i. We value high quality, ethically defensible, culturally responsive evaluation practices that lead to effective and humane organizations and ultimately to the enhancement of the public good.
ii. We value high quality, ethically defensible, culturally responsive evaluation practices that contribute to decision-making processes, program improvement, and policy formulation.
iii. We value a global and international evaluation community and understanding of evaluation practices.
iv. We value the continual development of evaluation professionals and the development of evaluators from under-represented groups.
v. We value inclusiveness and diversity, welcoming members at any point in their career, from any context, and representing a range of thought and approaches.
vi. We value efficient, effective, responsive, transparent, and socially responsible association operations.


I am a Professor of Education at the University of British Columbia where I continue my lifelong engagement in learning and teaching about evaluation. I am also currently Editor-in-Chief of New Directions for Evaluation, have served on various AEA committees, and have been a member of the AEA Board of Directors.

As evaluation theory and practice have taken root and spread around the globe it is exciting to see AEA adopting a Values Statement that reflects the importance of our intellectual and moral obligations as an organization and a profession. As Editor-in-Chief of one of AEA’s journals, I see these values as critical signposts for fostering the involvement, at many levels, of diverse perspectives. These values are important in how I do my work, which includes assembling a team of associate editors and editorial board members that has a global reach and reflects the cultural diversity that AEA’s values encourage. These values are also important in shaping the content of the journal as I have worked to include different points of view about evaluation; the perspectives of experienced and novice evaluators; and perspectives of evaluators from all parts of the world.

AEA’s Values Statement is also an important anchor for me individually, in doing evaluation as well as teaching about evaluation. From the beginning of my career until the present I have been guided by AEA’s values. I strive to foster inclusion through participatory approaches, practice in ethically defensible ways through transparency and thoughtfulness, and to do evaluation in the service of both clients and the greater good.

I contribute to AEA’s value of continual development of evaluators through my teaching and mentoring of graduate students as they learn the craft of evaluation. I am aware at all times of not simply transmitting knowledge and skills to the next generations of evaluators, but also the importance of imbuing novice evaluators with the foundational values on which AEA stands. As a professional organization, AEA provides useful guidance and reminders that evaluation is much more than a technical practice; it is also a moral and values-laden one. In my teaching I am eager for students to see themselves as ethically engaged, open to and engaged with many forms of diversity, and to see their future work as evaluators as meaningful and useful in building a better world.

rights-based evaluation

Rights-based evaluation is not a new idea and has been a prominent way of thinking in the international development world for at least a decade. A broad framework for human rights is established by a number of declarations including: the UN’s 1948 Universal Declaration of Human Rights and six core human rights treaties: the International Covenant on Civil and Political Rights; the International Covenant on Economic, Social and Cultural Rights; the Convention on the Rights of the Child; the Convention against Torture and other Cruel, Inhuman or Degrading Treatment or Punishment; the International Convention on the Elimination of All Forms of Racial Discrimination; and the Convention on the Elimination of All Forms of Discrimination against Women. (Each of these declarations can be found at the UNICEF website.)

In evaluation we consider needs assessment to be a key tool, and this can be relatively easily understood as a rights assessment, especially when we understand that rights are what is essential to live as human beings, the basic standards without which people cannot survive and develop with dignity. Human rights are thus inherent to the person, inalienable and universal. People may still need more than they have a right to, but this would seem a fundamental baseline.

A useful, short guide to the idea of a rights-based approach for program development, as well as monitoring and evaluation is Applying a Rights-based Approach: An Inspirational Guide for Civil Society.

Rights-based evaluation requires evaluators to be committed to general principles as key sources of criteria in evaluating; to taking a socio-cultural perspective and not just an inside a program box view; and requires evaluators to give as much time to the question of whether or not the right job is being done and not focusing only on whether the job is being done right.

getting to formative

Just as it makes little sense to talk about the validity of a test, it makes little sense to talk about a formative test. Although there is a good emphasis on formative assessment of student learning, there is an unfortunate confusion about what the formative means. Too often the instrument is identified as formative, when in fact it is how the information from the instrument is used that makes the evaluation formative. The same test and the results of that test can be used either formatively or summatively. Just as the test is not valid (it is the inferences that are made that have or lack validity), neither is the test itself formative or summative. Popham has a nice little discussion of this in his Ed Week piece Formative Assessment–A Process, Not a Test.

Serving the Public Interest through Educational Evaluation

This is a pre-publication version of a chapter that analyzes the nature of educational evaluation in a global, neo-liberalist world. The chapter includes some advice to evaluators on how to take back evaluation to serve democratic values, that is, the inclusion of all stakeholders (especially those most often shut out) and open deliberation about what it means for education and schooling to be good or bad.

This chapter will appear in Ryan & Cousins’ edited International Handbook of Educational Evaluation to be published by Sage.

Evaluation, more than efficacy

In this op ed piece, Francis Schrag points out a key feature of evaluation by comparing an evaluation of NCLB and the diabetes drug Avandia. A key element that must be included in any evaluation are side effects, the unanticipated outcomes. Even if the planned outcomes occur and in large measure they can be nullified by the presence of harmful unanticipated outcomes…’No Child Left Behind’ doesn’t provide full picture
Francis Schrag
Guest columnist — 6/13/2007

Newspaper readers may have noticed recent articles reporting test score performance of Wisconsin or Madison public school students as well as articles reporting controversy surrounding the diabetes drug Avandia. It’s illuminating to compare the two.

In the latter case, there is apparently strong evidence that Avandia is effective — it lowers the level of sugar in the blood. This fact, however, does not automatically lead to endorsement of the drug. Why not? Because as many now know, there are potential safety concerns, notably alleged increased risk of heart attack.

There is an important lesson here in the medical sphere that ought to carry over to the educational sphere: Efficacy is not all we care about.

Just as all drugs produce multiple effects, so do all education policies, such as the No Child Left Behind law passed in 2002. It is tempting to assess the impact of such a law simply by comparing test scores in math and reading (the two subjects where annual testing is authorized) before and after passage of the law. After all, test scores reflect student achievement, and that is presumably what we’re after.

Alas, this comparison, difficult enough to make for all sorts of reasons, is the equivalent of measuring blood-sugar levels before and after use of Avandia without taking into account any side effects.

Which side effects should be taken into account? It would be nice to evaluate many, but evaluation is costly and time-consuming, so let’s restrict ourselves to one that is significant: continuing motivation to learn in all subjects. Why this one? Because, just as the increased risk of heart attack may outweigh the beneficial effects of a diabetes drug, so might a reduction in continuing motivation outweigh a modest gain in achievement scores.

How could we assess continuing motivation? First, we need to compare the motivation of public school students subject to the law with matched private school students who are not. Second, we need to provide both groups of students with opportunities to manifest continuing interest in learning by giving them the option to participate in an activity that would manifest that interest, for example reading additional books over the summer or participating in an after-school science fair.
Evaluators will need plenty of imagination to come up with valid ways of tapping students’ motivation to learn. This may be difficult, but failing to consider important side effects of school learning is irresponsible. Without a conscientious effort to tap important side effects, we’ll have no basis for ruling out the possibility that a policy designed to raise test scores does so only by putting another valued outcome at risk.
We don’t want our educational policies to be the equivalent of Avandia, but so far we’re making no effort to find out if they are.

Francis Schrag is a professor emeritus in the Department of Educational Policy Studies at the University of Wisconsin-Madison.