Some of the problems with module evaluation

By Dr Tim Herrick, School of Education

Please note this blog post was originally written in 2019, and was published on the original Elevate blog.

Understanding how students are using our teaching to support their learning is clearly an essential part

of professionalism within higher education. Conscientious teachers want to know on a personal level

what is going well and what could be developed; systematically, it’s useful to identify issues common

across departments or institutions; and more holistically, it helps informs our obligations to the

Quality Assurance Agency that we know what happens in our classrooms.

So we devote time and energy to student evaluation of teaching, at module, programme, and year level;

and the University has recently provided us with resources to facilitate that process, including standard

questions to ask all students.

And yet, I have concerns about some of what is taken as common practice. These are less about the
specific questions that we ask - although much ink has been spilled on that particular matter - or other
practicalities, for example the timing of the questions - such as before or after feedback and grades are
received, and, for courageous departments and individuals, mid-way through the module rather than at
the end. My concerns are more about the systematic biases that may be present in any system of
teaching evaluation - and perhaps more pointedly, how groups of students perceive individual teachers.

A while ago, through an LSE blog post, I was made aware of a piece of research in Innovative Higher
Education calling attention to gender biases in how students evaluate teaching. The study is nicely
designed - it’s based on student evaluations of an online module, with one male teacher and one female.
The online class was divided into four groups - one taught by the male presenting a male identity, and
one taught by the female presenting a female identity. The final two reflect the ingenuity of the design,
as the third was taught by the male under a female identity, and the fourth by the female under a male
identity. So students were experiencing similar, sometimes identical, interactions, with the same
teacher, sometimes under their true gender identity, sometimes under the assumed one, without ever
meeting their teacher in person.

It would be nice to say that this led to consistent student evaluations, regardless of their perceptions of
gender. However: unfortunately not. While all four groups gave positive scores to all “four” of their
instructors, the scores for the “male” teachers were higher than the “female” teachers. As the authors
explain:

"the same instructor, grading under two different identities, was given lower ratings half the time
with the only difference being the perceived gender of the instructor"

And by way of illustration:

"when the actual male and female instructors posted grades after two days as a male, this was
considered by students to be a 4.35 out of 5 level of promptness, but when the same two
instructors posted grades at the same time as a female, it was considered to be a 3.55 out of 5
level of promptness"

These findings were reproduced by the LSE authors linked above. The explanation the original article
puts forward is that students have higher expectations of female teachers, wanting them to be
approachable, empathetic, and warm in their communications, as well as knowing their subject matter
and being clear in how it is explained. The bar for men is set lower, leading to that horrible double bind -
women are penalised for not meeting higher expectations, while men are rewarded for doing anything
beyond a lower level of expectation.

So what to do? First of all, recognise the problem - and while we’re there, look hard at the more complex
data on ethnicity and student evaluation of teaching - and if anyone knows of research
on dis/ability, I would be glad to read it. Secondly, share the problem with our students - in this woke
era, I doubt many of them would want to collaborate in a(nother) system that appears to privilege white
men. Encourage them to consider their own presumptions before completing the evaluation form; and
perhaps, like Geography, hold a conversation with them about what kinds of feedback are most useful
for staff to receive.

And lastly, alongside the standardised evaluation instruments, think of other ways in which we can come
to know how our students are experiencing our teaching (of which, to slip in some quick plugs, there
are some great ideas on the Elevate student engagement webpages, and a
Student Observation of Teaching scheme that I run). We may need, for the purposes of quality
assurance, to reach a minimum threshold of student evaluation; but to reduce the impact of systematic
biases such as gender, and to hear most clearly their suggestions for how teaching might be enhanced,
I would argue that we also need to do more.