An encyclopedia entry that introduces evaluation as a discipline… evaluation
Logic models (similar to program theory) are popular in evaluation. The presumption is that programs or interventions can be depicted in a linear input output schema, simplistically depicted as:
This simple example can be illustrated by using this model to evaluate how an information fair on reproductive health contributes to the prevention of unwanted pregnancies.
The inputs are the money, labour, and facilities needed to produce the information fair.
The activity is organizing and presenting the information fair.
The output is that some people attend the info fair.
The outcome is that some of those who attend the info act on the information provided.
The impact is that unwanted pregnancies are reduced.The idea is that each step in this causal chain can be evaluated.Did the inputs (money etc.) really produce the intervention?
And did the activities produce the output (an informed audience)?
Did the output produce the outcome (how many attendees acted on the information)?
To measure the impacts, public health statistics could be used.
A quick overview of logic models is provided on the Audience Dialogue website. One of the best online resources for developing and using logic models is Kellogg Foundation’s Logic Model Development Guide, and loads of visual images of logic models are available, and aboriginal logic models have also been developed.
See also Usable Knowledge’s short tutorial on creating a logic model.
And readIan David Moss’ In Defense of Logic Models, which is probably the most reasoned response to many of the criticisms… take a look at the comments to his blog post as they extend the discussion nicely.
In some Olympic sports thousandsth of a second matter. In the men’s doubles luge run the difference between the gold and silver medals was about 1/2 a second (.522 of a second to be exact). Lugers compete against a timer and luge is probably one of the most precisely timed sports in the world. Just to be clear, luge specifies a base weight (90 kg for individuals, 180 kg for doubles) and lugers may add weights to their sleds so that each run is precisely the same weight, and skill in maneuvering the track is what accounts for differences in time. Luge is a sport that is judged entirely on the outcome ~ the shortest time. How you get there doesn’t matter, other than that it is understood that following the “perfect line” is likely to get you to the finish line in the least amount of time. However, in luge nuance is critical. But often that nuance escapes even the knowledgable commentators who attempt to give spectators a sense of what is happening during a luge run. Mostly it comes down to a better run is one where the luger moves very little and doesn’t hit a wall!
For those of us doing program evaluation in the world of social, educational, health, policy interventions we might envy such precise measurements, but the work we do is different in a number of ways. Precision of measurement must be judged within the context of evaluation. First, we have no singular and unambiguous outcomes to measure. Our outcomes are constructs, ones that depend for their definition on values and ideologies. For example, poverty reduction might be an agreed upon outcome, but how that is conceptualized is quite elastic. And poverty reduction is likely conflated with other constructs like food security or affordable housing. Second, measures used in evaluation are not like time. We have no analogous high precision outcome measure to time in luge competitions, in large part because of the ambiguity of our outcomes. And last, we seldom want to give up investigating process and focus solely on outcomes. In the social world, how we attempt to ameliorate problems is an essential component of the quality of those efforts… outcomes matter to be sure, but getting to outcomes matters as much, and sometimes more.
Perhaps a step closer to being a discipline, the American Evaluation Association project to define evaluation might signal that we are getting down to the fundamental ideas in our field. A committee has developed a definition that its chair, Michael Q. Patton describes as “a living document, ever being updated and revised, but never becoming dogma or an official, endorsed position statement.” Bravo to all for this initiative!
The open-access, participatory strategy is an interesting and forwarding thinking one, and I will be curious to see if and how that statement changes over time. My prediction is that it won’t change much. The statement as it is pretty much captures what anyone would say an introductory evaluation course, but we shall see.
I think, however, there are a couple of key details missing from this definition… details that might bring clarity about the foundations of evaluation. As the definition now stands, it focuses primarily on evaluation practice and less so on the discipline of evaluation. The initial definition is what we all say when we explain what evaluation is:
Evaluation is a systematic process to determine merit, worth, value or significance.
The string of descriptors about what evaluation is a determination of are important, and they are not the same. The definition provides no guidance about what the differences are and why we provide this string in our definition. What is the difference between merit and worth, and how are those different from value or significance? This is not a trivial matter and lack of understanding about these distinctions sometimes gives evaluation a bad name. For example, when an evaluation focuses on determining the worth of an evaluand and is found wanting there is often a hue and cry when that same evaluand is simultaneously meritorious.
The second detail that is missing is the logic of how we get to those judgements of merit, worth, value and significance. The definition says that evaluation is a “systematic process” but provides no hint of what makes evaluation systematic. Perhaps this is one of those contentious areas that Patton describes when he introduced the statement, “There was lots of feedback, much of it contradictory.” But, from the statement, we cannot know whether the committee talked about including details about what makes evaluation systematic and couldn’t come to agreement, or if this was never discussed in the first place. Perhaps being systematic has two meanings that get entangled… we use models/approaches in evaluating that provide guidance about how to do evaluation (UFE, RCT, participatory, and so on) AND there is a logic to thinking evaluatively that is embedded in all models/approaches to evaluation. There is no need to include the former in a definition of evaluation, but there is a need to include the latter.
Michael Scriven has provided the grounding for articulating the logic of evaluation, Deborah Fournier has done considerable work on articulating what that logic looks like in practice (that is, how it is manifest in various evaluation approaches/models), and both Michael Scriven and Ernie House have tackled the specific issue of synthesis in evaluation. This logic is at the heart of what makes evaluation systematic and I’d like to see this in this definition. (For a quick introduction to these ideas, check out the entries in the Encyclopedia of Evaluation by these authors.)
As an organic, evolving definition of evaluation, perhaps these are components that will still be developed and included.
I. Feminist Theory, Research and Evaluation
1. Feminist Theory: Its Domain and Applications, Sharon Brisolara
2. Research and Evaluation: Intersections and Divergence, Sandra Mathison
3. Researcher/Evaluator Roles and Social Justice, Elizabeth Whitmore
4. A Transformative Feminist Stance: Inclusion of Multiple Dimensions of Diversity with Gender, Donna M. Mertens
5. Feminist Evaluation for Nonfeminists, Donna Podems
II. Feminist Evaluation in Practice
6. An Explication of Evaluator Values: Framing Matters, Kathryn Sielbeck-Mathes and Rebecca Selove
7. Fostering Democracy in Angola: A Feminist-Ecological Model for Evaluation, Tristi Nichols
8. Feminist Evaluation in South Asia: Building Bridges of Theory and Practice, Katherine Hay
9. Feminist Evaluation in Latin American Contexts, Silvia Salinas Mulder and Fabiola Amariles
III. Feminist Research in Practice
10. Feminist Research and School-Based Health Care: A Three-Country Comparison, Denise Seigart
11. Feminist Research Approaches to Empowerment in Syria, Alessandra Galié
12. Feminist Research Approaches to Studying Sub-Saharan Traditional Midwives, Elaine Dietsch
Final Reflection. Feminist Social Inquiry: Relevance, Relationships, and Responsibility, Jennifer C. Greene
Outcome Harvesting is an evaluation approach developed by Ricardo Wilson-Grau. Using a forensics approach, outcome harvesting has the evaluator or ‘harvester’ retrospectively glean information from reports, personal interviews, and other sources to document how a given program, project, organization or initiative has contributed to outcomes. Unlike so many evaluation approaches that begin with stated outcomes or objectives, this approach looks for evidence of outcomes, and explanations for those outcomes, in what has already happened… a process the creators call ‘sleuthing.’
This approach blends together, and maybe eliminates the distinctions, among intended and unintended outcomes. Evaluators are enjoined to look beyond what programs say they will do to what they actually do, but in an objectives driven world this requires evaluators to convince clients that this is important or necessary, and justifying the expenditure of evaluation resources on a broader concept of outcomes than is often defined.
Wilson-Grau has written a clear explanation of the process, which can be downloaded here. In the downloadable pdf, the six steps of outcome harvesting are summarized:
1. Design the Outcome Harvest: Harvest users and harvesters identify useable questions to guide the harvest. Both users and harvesters agree on what information is to be collected and included in the outcome description as well as on the changes in the social actors and how the change agent influenced them.
2. Gather data and draft outcome descriptions: Harvesters glean information about changes that have occurred in social actors and how the change agent contributed to these changes. Information about outcomes may be found in documents or collected through interviews, surveys, and other sources. The harvesters write preliminary outcome descriptions with questions for review and clarification by the change agent.
3. Engage change agents in formulating outcome descriptions: Harvesters engage directly with change agents to review the draft outcome descriptions, identify and formulate additional outcomes, and classify all outcomes. Change agents often consult with well- informed individuals (inside or outside their organization) who can provide information about outcomes.
4. Substantiate: Harvesters obtain the views of independent individuals knowledgeable about the outcome(s) and how they were achieved; this validates and enhances the credibility of the findings.
5. Analyze and interpret: Harvesters organize outcome descriptions through a database in order to make sense of them, analyze and interpret the data, and provide evidence-based answers to the useable harvesting questions.
6. Support use of findings: Drawing on the evidence-based, actionable answers to the useable questions, harvesters propose points for discussion to harvest users, including how the users might make use of findings. The harvesters also wrap up their contribution by accompanying or facilitating the discussion amongst harvest users.
Other evaluation approaches (like the Most Significant Change technique or the Success Case Method) also look retrospectively at what happened and seek to analyze who, why and how change occurred, but this is a good addition to the evaluation literature. An example of outcome harvesting is described on the BetterEvaluation Blog. A short video introduces the example. watch?v=lNhIzzpGakE&feature=youtu.be
For many professionals doing evaluation is part of the job. Lawyers make evaluative judgements about the quality of evidence; teachers judge the quality of students’ learning; builders judge the quality of materials. All work entails judgements of quality, and the quality of work is dependent on doing good evaluations.
But what happens when the evaluation done as part of professional work is contested? You might just find yourself being sued. Such is the case with Dale Askey, librarian at McMaster University. Askey’s job requires him to make judgements about the quality of published works and in turn publishers to make wise procurement decisions for his employer, decisions that have become ever more difficult with shrinking resources. The case can be easily summarized:
Librarian questions quality of a publishing house.
Librarian publicly criticizes said press on his personal blog.
Two years later, librarian and current employer get sued for libel and damages in excess of $4 million.
Read more: http://www.insidehighered.com/news/2013/02/08/academic-press-sues-librarian-raising-issues-academic-freedom#ixzz2MDEYx2An
Inside Higher Ed
There is no reason to believe that Askey rendered his judgement about the quality of scholarship offered by Mellen Press in a capricious or incompetent manner. Making judgements for procurement decisions is surely one of the tasks that Askey’s employer expects him to do, especially in a time of diminishing resources.
There has been considerable support for Askey, some a bit misguided by defending his write to express his opinion on his blog, but most in defense of Askey’s responsibility to do his job.
So what is the relevance for evaluation? It is clear that evaluation is integral to all and applied in virtually all other intellectual and practical domains… it is as Michael Scriven claims, a trans-discipline. As such, there is a need to pay more attention in preparing people to do publicly defensible evaluations in the context of their work. Perhaps more than program evaluation, this sort of evaluative thinking might be the raison d’etre for the discipline of evaluation.
The New York Times columnist David Brooks nicely captures the problem that Carol Weiss identified several decades ago… data doesn’t speak authoritatively, nor should it. In evaluation and in decision making we take into account loads of data, available to us in greater amounts and more sophisticated ways, but it still is never enough on its own.
Brooks highlights the limitations of what he calls “big data.”
Data struggles with the social. Your brain is pretty bad at math (quick, what’s the square root of 437), but it’s excellent at social cognition. People are really good at mirroring each other’s emotional states, at detecting uncooperative behavior and at assigning value to things through emotion.
Computer-driven data analysis, on the other hand, excels at measuring the quantity of social interactions but not the quality. Network scientists can map your interactions with the six co-workers you see during 76 percent of your days, but they can’t capture your devotion to the childhood friends you see twice a year, let alone Dante’s love for Beatrice, whom he met twice.
Therefore, when making decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk.
Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. The human brain has evolved to account for this reality. People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.
Data creates bigger haystacks. This is a point Nassim Taleb, the author of “Antifragile,” has made. As we acquire more data, we have the ability to find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Falsity grows exponentially the more data we collect. The haystack gets bigger, but the needle we are looking for is still buried deep inside.
One of the features of the era of big data is the number of “significant” findings that don’t replicate the expansion, as Nate Silver would say, of noise to signal.
Big data has trouble with big problems. If you are trying to figure out which e-mail produces the most campaign contributions, you can do a randomized control experiment. But let’s say you are trying to stimulate an economy in a recession. You don’t have an alternate society to use as a control group. For example, we’ve had huge debates over the best economic stimulus, with mountains of data, and as far as I know not a single major player in this debate has been persuaded by data to switch sides.
Data favors memes over masterpieces. Data analysis can detect when large numbers of people take an instant liking to some cultural product. But many important (and profitable) products are hated initially because they are unfamiliar.
Data obscures values. I recently saw an academic book with the excellent title, “ ‘Raw Data’ Is an Oxymoron.” One of the points was that data is never raw; it’s always structured according to somebody’s predispositions and values. The end result looks disinterested, but, in reality, there are value choices all the way through, from construction to interpretation.
This is a pre-publication version of an entry in the International Encyclopedia of Education, 3rd Edition. Please note the correct citation in the text and refer to the final version in the print version of the IEE.
Mathison, S. (2010). The purpose of evaluation. In P. Peterson, B. McGaw & E. Baker (Eds.). The International Encyclopedia of Education, 3rd ed. Elsevier Publishers.
There are two primary purposes of evaluation in education: accountability and amelioration. Both purposes operate at multiple levels in education from individual learning to bounded, focused interventions to whole organizations, such as schools or colleges. Accountability is based primarily on summative evaluations, that is, evaluations of fully formed evaluands and are often used for making selection and resource allocation decisions. Amelioration is based primarily on formative evaluation, that is, evaluations of plans or developing evaluands and are used to facilitate planning and improvement. Socio-political forces influence the purpose of evaluation.
Beginning a new semester I strive to find ways to illustrate that evaluative thinking is all around, and indeed may by the most ubiquitous form of thinking that we as humans do. In the words of The Troggs…
I feel it in my fingers
I feel it in my toes
[Evaluation] is all around me
And so the feeling grows
There are plenty of examples like Consumer Reports or Rotten Tomatoes or The Wine Spectator. But there is nothing like the many TV contest shows for entertaining examples of evaluation. This term my class watched the show Four Weddings and analyzed how the evaluation was done. It is quite a rich example that illustrates the logic of evaluation including such things as:
- > the complexity of establishing criteria, including the need for far more specificity than one might think
> relative weighting of criteria
> the slippery and amorphous nature of standards
> how grading and ranking differ, but can be combined in a single evaluation
> what the attributes of a good evaluator might be
It’s written on the wind
It’s everywhere I go, oh yes, it is
So if you really [evaluate]
Come on and let it show