evaluation plans, some suggestions and resources

There is no one right way to do an evaluation plan, but there are some fairly standard components to an evaluation plan. Like,

  • the purpose of the evaluation
  • program description
  • a stakeholder assessment
  • evaluation questions
  • how those questions will be answered
  • criteria and standards
  • a communication plan
  • how the evaluation will be managed ~ personnel, budget and timeline

This template from CDC is a good place to start: evaluation plan outline. And the Evaluation Toolkit created by the Pell Institute provides direction as well. But also consider whether alternative formats for evaluation plans might be more effective, like this visual evaluation plan strategy using DoView described by Paul Duigan.

Because evaluation plans are meant to clearly outline what the evaluation will focus on and how it will be done tables, charts, figures and graphics often communicate more succinctly than prose.

Screen Shot 2015-08-06 at 11.35.29 AMFor example, describing the program and what it is supposed to accomplish might look like this.

 

 

 

 

 

 

Screen Shot 2015-08-06 at 11.38.02 AM And, evaluation questions, methods and even the overall evaluation design might look like this.

 

 

 

 

 

 

Screen Shot 2015-08-06 at 11.42.00 AMAnd, a communication plan might look like this.

 

 

The ability to capture the big picture visually is also helpful and communicates the overall plan. Here are a few examples from Bethany Laursen, which Sheila Robinson shared on Evaltalk.

getting the most from formative evaluation

While the distinction between formative and summative evaluation is often drawn too sharply, there is a real distinction. For formative evaluation to really be formative, there needs to be a steady flow of evaluative feedback that allows for corrections along the way… that is, to do what ever is being done better, both in the present and into the future.

Compare two approaches to formative evaluation ~ real time evaluation and digital portfolios of student learning.

Real Time Evaluation

Screen Shot 2015-07-26 at 9.31.55 AM

http://reliefweb.int/report/nepal/nepal-gorkha-earthquake-2015-internal-real-time-evaluation-emergency-health-response

An evaluation approach that captures this is “real time evaluation,” an idea that appears often in humanitarian relief efforts. With a disastrous situation that demands rapid alleviation of suffering comes the need for quick information about whether that suffering is indeed being alleviated, and if it isn’t then what might be done in order to do so. RTE emphasizes timely evaluation feedback to strengthen program design and implementation, some common features are:

  • RTE takes place during implementation
  • is iterative
  • short time-frame is short, days not weeks
  • relies on secondary sources of information, but also field visits
  • use internal ‘consultants’
  • emphasis on process and immediate lesson-learning
  • ‘quick and dirty’ results enable quick program changes

Digital Portfolios of Student Learning

While traditional report cards have long been the mainstay in reporting student learning technology that allows for ongoing feedback about what and how students are learning are now common. Digital portfolios are collections of evidence managed by users and shared electronically, often on the web but increasingly through other social media platforms. One example is Fresh Grade, an app that facilitates documenting and sharing learning activities and outcomes. Common features of digital portfolios are:VT43.36 Master.indd

  • user driven (usually students, but also increasingly teachers)
  • shared digitally
  • ongoing representation of learning
  • includes direct evidence
  • keyed to stated criteria and standards
  • modifiable as an end product, summative evaluation

What can we learn from these examples?

RTE is often done under difficult circumstances with limited ability to collect data first hand and thus is content with ‘quick and dirty’ results. Disaster situations make it onerous to be in the field and evaluation relies on reports from the field (observations of aid workers, sector staff in the area, and so on). On the other hand, classrooms and other educational settings are easy to access, but the data about learning activities and outcomes are similar to reports from the field. Digital portfolios and especially the real time apps (like Fresh Grade) provide immediate evidence of what is going on and what is being accomplished. Apps allow students and teachers to create and share information on an ongoing basis, but permit editing and adding to the record over time. If we think about an individual student’s learning as a model for a program, perhaps this technology has something to offer formative program evaluation.

RTE could use an app or web based platform (most are available for smart phones and tablets, and there are a number of web-based tools that might serve this purpose: Evernote, Google drive, Three Ring) so those on the ground could provide data about what is happening by sending photographs, interviews, observations, documents, and so on to evaluators who are unable to collect data firsthand. Connectivity may be an issue in some situations, but even erratic connection would allow for varied and compelling data to be shared. In non-emergency situations this wouldn’t be a problem. Technology that allows for sharing information easily and often may increase the likelihood adjustments can be made and thus the purpose of formative evaluation realized.

 

Participation in Humanitarian Evaluation

multicultural_people_photos_on_sphere

Chris Morris guest blogged on the Impact Ready blog about some findings from his research on the role of evaluation in accountability in interventions and programs in crisis effected communities. He focuses in this blog post specifically on the lack of participation by local communities in evaluations that are meant to provide accountability to those most effected. Click here to read the whole post.

student evaluations of teaching ~ a fundamental problem for higher education

51c8ba4f51e80.imageStudent evaluations of teaching (SET) are standard fare in higher education where there is often relatively little information available to judge what is supposed to be a major part of professors’ work. There are often long end of course surveys students complete, covering everything from whether the content was relevant to the professor’s competence to whether students felt fairly treated. Putting students in a role as observers of what happens in classes isn’t a bad idea, and I’ve written about the Measures of Effective Teaching Project here. Not everything that is asked of students is clearly in their realm though, such as whether the professor knows his/her stuff, and much more care should be given to asking students to observe what is observable and experienced specifically by them. This matters when the data from these surveys are used to make career decisions about professors. The overall evaluation process for tenure, promotion, merit pay, and continuing appointment involve more components than teaching and indeed scholarship (publishing) is often compensatory and can nullify a poor teaching evaluation. But that’s another post, for another day.

A recent study at NC State points to a specific, and serious, problem with student evals of teaching: they are gender biased. One of the basic issues in instrument construction is validity and it would appear student evaluations of teaching aren’t living up to this expectation.

“The ratings that students give instructors are really important, because they’re used to guide higher education decisions related to hiring, promotions and tenure,” says Lillian MacNell, lead author of a paper on the work and a Ph.D. student in sociology at NC State. “And if the results of these evaluations are inherently biased against women, we need to find ways to address that problem.”

Benjamin Schmitt, an assistant professor of history at Northeastern University, who is interested in data visualization has given us what might be the most useful data to see the effects of gender bias. He used Rate My Professor as the data source describing his analysis here, and before you get all uppity about this there is at least some evidence of a reasonably high correlation between ratings on RMP and institutional SETs. Schmitt has created an interactive chart to explore gender, discipline and student ratings. It’s pretty amazing… and eye-opening. I’ve clipped just a few examples when different terms are used to search the data base. Apparently male professors in psychology might have futures as stand up comedians, all education professors are so caring any of them might be the next Mother Teresa, and all philosophy profs are brilliant but men more so than women. Not all descriptors are gender biased… attractive, for example, leads us to believe education professors are the least attractive (ouch!) while the hotties are female criminal justice and male philosophy profs… but a surprisingly consistent pattern of gender bias against women across disciplines is apparent.

search for FUNNY

search for FUNNY

using CARING

using CARING

BRILLIANT as the search term

BRILLIANT as the search term

 

Explore this interactive chart and be amazed, even appalled. You’ll find that men are more likely to be seen as a star, knowledgeable or awesome and women are more likely to be seen as bossy, disorganized, annoying and to play favourites, whilst being either nicer or ruder than men. (Wait, is that rudely nice OR nicely rude?)

But then ask if the student ratings have any basis in fact. Are philosophy professors really more brilliant than professors in accounting, business or health sciences? Do women across all disciplines lack a sense of humour? In other words, the problem with these ratings are complex and layered… gender bias, for sure; truth value, disputable. And yet, faculty committees and higher ed administrations persist in using such flawed data or at least continue to symbolically offer it up as grounds for important personnel decision making.

alternatives to standardized testing

67547_10151545229449501_624032542_nIn educational evaluation the global educational reform movement (GERM) has privileged common indicators of student learning outcomes (used in turn for other evaluation purposes like teacher evaluation, even if not a sound practice). There are many reasons why standardized tests become the norm and are reified as the only fair and legitimate way to know how students and schools are doing. There is plenty of literature that debunks that idea.

However, the narrative of standardized testing as a necessary and legitimate means of judging the quality of learning and schooling is powerful and political. In a short commentary for NPR a reporter, Anya Kamenetz, nicely summarizes reasonable alternatives, and these are excellent talking points when confronted with the question, “If not standardized tests, what then?” You can read the article, but in summary:

1) use some sort of matrix sampling (a good idea from NAEP)

2) consider ongoing embedded assessments (this is usually computer based testing)

3) think about what you want to know and it will require multiple measures (in other words, knowing scores in a few subject areas will never be enough, and maybe there are things worth knowing beyond the obvious)

4) start considering novel approaches to assessment, like game based assessment and the not so novel use of portfolios or narrative evaluations

5) think differently about what it means to judge a school and that means looking at more than just narrow student outcomes (school inspections are a notion worth revisiting).

Reporting evaluation results ~ the case of school report cards

Report cards are a particular example of reporting evaluation results ~ telling students and their parents how well they are doing at what they are supposed to be learning. Typically, report cards are done three times in a school year and this time of year is about when that first report card comes home. I did a Q & A about report cards, and had this to say:

A UBC education expert explains why rewarding good grades is an automatic fail

Whether it’s straight As or straight Cs, UBC education professor Sandra Mathison says parents should ask plenty of questions about a student’s report card, and cautions against rewarding good grades.

How useful are report cards?

Prof. Sandra Mathison

There are different perceptions about what types of report cards are useful. What educators see as being appropriate is often not a view shared by parents. This is a fundamental contradiction.

Parents see marks within a framework of competition. Knowing that your child is getting an A means they’re in the top group, and if they get a C, then your child is doing a lot worse than some classmates.

For educators, the evolution of report cards is about providing information that is substantive and communicates, more specifically, the knowledge and skills that students are acquiring and demonstrating. That requires more detail. For example, educators can break “reading skills” down into phonemic awareness, comprehension, vocabulary, and so on.

Having detailed information communicates more clearly what kids know, and the move in most report cards is towards that. But parents often push back and say, “That’s too much information. I just want to know if my kid is doing OK.”

Are letter grades helpful?

Most of the time letter grades represent big categories and are therefore devoid of much meaning. If a high-school student gets a B in chemistry, what does that mean? Chemistry is complex. It has many components. Does the student understand the concepts, but struggles with those ideas in an applied laboratory context? Does the student understand the ideas but has a difficult time with nomenclature?

How should parents react to a less than stellar report card?

It’s more appropriate to ask what sense parents should make of report cards in general. A parent whose child gets all As should be asking questions, just like a parent whose child gets all Cs or Ds or Fs. Simply accepting some general level of performance, whether it’s very good or not, suggests you don’t actually want to know anything detailed about what your child is learning.

All parents, regardless of the letter grade, should say, “Help me understand in more detail what my child knows and can do, and what my child doesn’t seem to know and is having trouble doing.”

Ask the teacher to be analytic about the child’s knowledge and performance, and to show the parent work the student has done that shows something they’re struggling with. Then the parent can see and ask, “What can you and I do in order to help this child?”

Is it helpful to reward good grades with money or gifts?

It’s highly problematic. From an educational research perspective, we foster kids learning things because they have intrinsic value, not because they have rewards attached.

Parents and teachers should foster children’s interest in learning for the sake of learning, not the pursuit of grades or external rewards. When you give cash or gifts to students, you’re saying to them that what matters most are the outcomes. And when you put the emphasis on simple outcomes, such as getting an A or a particular score on a standardized test, it becomes a defining feature of learning. The quality of student thinking and their interest in the content is diminished when they focus on getting grades for gifts.

This short piece generated more media attention than I anticipated, perhaps because many of those in the media are themselves parents facing their children’s report cards.

A follow-up to this short piece was my segment on Stephen Quinn’s CBC On the Coast. Listen here.

learning to be an evaluator ~ many ways, many contexts

FFA-competition-JB-3-218x300For many of us we naturally think about learning to do evaluation within the context of degree programs, professional development workshops, and sometimes on the job training. In so doing education in evaluation is seen as more limited than is practically the case. Because evaluation is perhaps one of the most common forms of thinking (whether it is done well or not) there are a dizzying array of contexts in which people learn to make judgements about what good is.

Yesterday, hundreds of young people gathered in rural North Carolina to demonstrate their evaluation skills… in dairy cow judging.

participants are scored based on how well they apply dairy cattle evaluation skills learned in the classroom. Each team evaluated six classes of dairy cattle and defend reasoning for evaluation to a panel of judges

While future farmers of America may do cow judging in preparation for careers as future dairy farmers, historically the evaluation skills demonstrated were key to selecting the best, most productive and healthy herd upon which the farmer’s livelihood depended.