A few teachers
have commented to me this year that they would have no idea how to determine
whether a new piece of research is “any good” or represents a change of
practice that they should adopt. While it’s probable that many
health science practitioners would also say they struggle with the task of
critically appraising new research, I think this is a particular challenge for
teachers, who historically, have not been taught about research methods and
data analysis in their pre-service training. This leaves teachers vulnerable to
“the next new thing” that policy-makers decide to introduce, and makes it hard
for them to argue their corner with any confidence.
No blog post can
adequately stand in for two, three, four or more years of research methods
training, but I thought it might be helpful here to sign-post a few key points
to de-mystify some of the research landscape for teachers.
Here’s my
Top-10 questions to keep in mind when reading about new research:
1. Where is the
study published?
a. The optimum answer
to this question is “In a peer reviewed journal”. By “peer review” we mean that the
researchers sent their manuscript to an academic journal editor, the editor
considered its general suitability for the journal, and then nominated a couple
of academic “peers” to conduct a detailed review. This process is often
conducted on a double-blind basis – i.e. the reviewer does not know the author’s
identity and vice versa. However some journals use an open-review process. The
distinction is not important for our purposes here. What matters is the level
of scrutiny the paper receives, in terms of the theoretical logic behind its
rationale, its method, data collection, analysis and interpretation.
As any academic will attest, this can be a bruising process and we often need to don a metaphorical rhino hide before opening the email with a subject line “MS 2014XYZ Decision” or similar. Reviewers rarely spend a lot of time on the study’s strengths, highlighting instead its flaws and limitations. This is not a game for the faint-hearted.
The upside though is that most published papers have undergone considerable revision by the time they go to print, and the researchers may have had to patiently and painstakingly address a myriad of queries and challenges to their argument.
As any academic will attest, this can be a bruising process and we often need to don a metaphorical rhino hide before opening the email with a subject line “MS 2014XYZ Decision” or similar. Reviewers rarely spend a lot of time on the study’s strengths, highlighting instead its flaws and limitations. This is not a game for the faint-hearted.
The upside though is that most published papers have undergone considerable revision by the time they go to print, and the researchers may have had to patiently and painstakingly address a myriad of queries and challenges to their argument.
b. But not all journals
are created equal. Academics are in the know about esteem hierarchies and metrics
such as impact factors. Universities are in the know about these as well, and bring
considerable pressure to bear on academics to publish in high-impact journals. One problem with this is that such journals may only be read by other academics and never by practitioners “on the ground” – so while it’s gratifying
to have your research cited by other researchers, it may not be translating
into meaningful, real-life change.
c. Media reports, blogs
and websites often provide accessible, easy to read summaries of research, but
should not be the primary source of research. If they are the primary source,
you should remember that the rigorous peer-review process outlined above has
almost certainly not taken place.
2. Who are the
authors?
a. Is there a well-qualified
academic on the research team? Particular knowledge of research methodology,
data collection, analysis and interpretation is needed in order to conduct rigorous
research. Look for evidence that this exists in the research team (e.g. a
university-affiliated team leader).
b. Are there potential or
actual conflicts of interest for any of the team? Examples of this might be
someone employed by a particular publishing house being part of a team that is
evaluating an intervention in which that publishing house has a commercial stake.
You should also look at the funding source(s) and ask yourself whether there
might be vested interests in the data telling a particular story.
c. What else has this
team published? What do we know about their ideological stance / bias?
(everyone has one!)
3. What was the
context of the study?
a. What country was the
study conducted in? You might read a fabulous report of a rigorous piece of
research that was conducted in Uzbekistan, and be quite confident that it is
tight and well-controlled. But if there
are significant differences between the Uzbeki educational context and your
own, you might want to think carefully before adopting any recommended changes.
b. What is the policy framework
in which the study was conducted? Are there particular teaching approaches that
are explicitly or implicitly associated with this setting?
c. What are the demographic
characteristics of the sample? Here we
need to think about socio-economic status (SES) factors, ethnicity, culture,
religious influences, age and gender characteristics and any other wider
influences on the context that might be relevant. The authors might tell you
that the study was conducted in “10 schools with similar socio-economic characteristics”,
but this doesn’t help you very much if you don’t know what those characteristics
were – i.e. were the schools in a disadvantaged area, or were they middle
or high-SES? This has important implications for the extent to which findings
can be generalised beyond the study – no
matter how rigorous the study itself may have been.
4. How clearly was
the research question stated?
a. Some studies are
highly specific with respect to their purpose and this can be easy to see even
from the title. Unfortunately, though, some research studies are a bit like
fishing expeditions – the researchers pack their gear and head out into the
wild to see what they can find. While it is absolutely appropriate for
qualitative studies to take a broader sweep around “exploring and understanding”
a phenomenon, you should always have a clear sense of what the researchers are
examining and why.
5. How adequate was
the sample and the description of the intervention?
a. Here we’re
interested in issues like sample size (e.g. the number of teachers, students,
schools etc) included. However there is no simple absolute answer to the
question “How many is enough?” Sample size should, however be based on some
kind of “power analysis” – a statistical consideration of the nature of the
questions asked and the number of participants needed to test an hypothesis.
If for example, you wanted to know about differences in vocabulary size
between four year olds and eight year olds, we would expect that age would have
a “big effect” and we would need a relatively smaller sample than if we were
studying the differences in vocabulary between four year olds and
four-and-a-half year olds – here there will be more developmental blurring
between the two groups, and so to find an age effect (assuming one actually
exists), a larger sample would be needed.
b. Is there any potential
bias/distortion due to sampling processes? An obvious issue in schools-based
research is the (usual) requirement for parent/guardian consent. However it may
be that parents from non-English speaking backgrounds cannot adequately
understand the Information Sheet and Consent Form, and so decide (quite reasonably!) to not complete them and return them to the school. This will
then introduce a systematic bias into the sample, and means findings can
really only be generalised to other groups of similar composition.
c. In studies that
involve any kind of pre-post comparison (e.g. collection of baseline data at
Time 1, an intervention phase, and collection of follow-up data at Time 2), it’s
important to think about retention of participants over time, and most
importantly to look at the characteristics of participants who were lost to
follow-up. Often, these are from minority groups or have some other defining
characteristic (e.g., frequent suspensions due to behaviour problems) that might in itself influence the Time 2 scores.
d. If it is an
intervention study, how were participants allocated to study arms (research Vs
control?) Ideally this should occur via a process of randomisation, so that
potentially confounding variables (e.g. ethnicity, IQ) are equally distributed
across study arms and so are “cancelled out” in the analysis. In some medical
research, it is possible to conduct “double blind” trials, in which neither the
participants nor the researcher interacting with them is aware of who is in
which group. This is harder to do in schools, for obvious reasons, but in
general, you should look for evidence that the researchers did not influence
the allocation of individuals or schools to one study arm or the other.
e. Also with intervention studies, ask yourself about the basis of the intervention. Does it have a theoretical rationale that draws on previous research, or is it just someone's idea about what might work? There's unfortunately been way too much of the latter in education. You should also ask whether the intervention was delivered as intended (so-called fidelity) and whether anything else might have happened during the intervention that could independently account for an apparent improvement in student performance.
e. Also with intervention studies, ask yourself about the basis of the intervention. Does it have a theoretical rationale that draws on previous research, or is it just someone's idea about what might work? There's unfortunately been way too much of the latter in education. You should also ask whether the intervention was delivered as intended (so-called fidelity) and whether anything else might have happened during the intervention that could independently account for an apparent improvement in student performance.
6. How suitable are
the measures for the questions asked?
a. If I told you that I
was going to measure children’s IQs, and then proceeded to take out a tape
measure and record their head circumferences, I think you would rightly howl me
down for using an inappropriate measure of IQ. Fortunately, extremely poor
choices such as this are not common, though it is common for researchers to select
assessment tools that others consider to lack validity (accuracy) or reliability
(consistency and trustworthiness). We also need to consider how current the
measures are and whether they are widely known and well-regarded.
b. Who conducted the
assessments / measurements? Just as we
don’t want doctors interviewing their own patients about the acceptability of a
new treatment, we don’t want teachers assessing their own students. Humans are
prone to all sorts of conscious and unconscious bias, whether as the observer
(see Rosenthal Effect) or as the observed (see Hawthorne Effect).
7. How clearly are
the results presented?
a. Are all of the
results presented, or just some of them?
b. Often it’s necessary
to have a good grasp of statistics to wade through this section of a paper, so
don’t be put off if you don’t feel you bring the necessary background knowledge
to the table. If necessary, consult with someone who is more confident with
this territory, but persevere with other sections of the paper. Many academics
would probably privately admit that they don't give this part of the paper the
focus they should – which is a shame as it’s often the most difficult to write!
8. When results are
discussed, are a range of possibilities canvassed to account for the findings,
or do the authors just stick with their original hypothesis?
a. Unfortunately there
is a well-known bias in what gets published, and findings that don’t sit well
with researcher bias and/or the prevailing zeitgeist often just don’t see the
light of day. Happily though, that is beginning to change, and academics and
journal editors alike are a little more open to publishing findings that might
be unexpected. Here we want to see a range of possibilities being canvassed,
and the importance of future replication studies being noted. The language used should be appropriately cautious and circumspect, e.g., "These findings suggest....", or "Our results are consistent with the notion that .....".
9. Are limitations
acknowledged and addressed?
a. All research has limitations,
and most researchers are acutely aware of this when they submit a paper to a
journal (if they weren't beforehand, the review process normally fixes that!). So
you should expect that some limitations and their potential importance are
considered (e.g. small or biased sample,
limited follow-up time).
10. Are implications
for theory, practice, policy and/or further research stated?
a. The purpose of
research is to effect change - in
at least one of theory, practice, and policy. So the authors should present
some ideas about the implications of their work (without over-reaching of
course) and should make constructive suggestions as to how other researchers
can advance the field even further.
This is by no means an exhaustive guide to critical appraisal and nor is it intended to be such. It is however, intended to guide the novice and instil some confidence that even without detailed statistical knowledge, you can still be an astute consumer of new research.
Remember too,
that we rarely change tack on the basis of one study. Instead we rely on
consistent trends in well-conducted research, to guide policy and practice – so
look out for systematic reviews or meta-analyses, both of which pool findings on a particular question and synthesise the
current state of the evidence.
I would
recommend that all teachers bookmark the Macquarie University MUSEC Briefings page, as this open-access site provides reliable, independent assessments of a range of approaches that
may or may not be well-supported by research evidence.
As in all
things though, it is wise to remember that when something seems too good to be true ..... it probably is.
"Trust me, I'm a researcher" is never enough.
"Trust me, I'm a researcher" is never enough.
(c) Pamela Snow 2014