Wednesday, 17 December 2014

Sorting the wheat from the (research) chaff. A rough guide.

A few teachers have commented to me this year that they would have no idea how to determine whether a new piece of research is “any good” or represents a change of practice that they should adopt. While it’s probable that many health science practitioners would also say they struggle with the task of critically appraising new research, I think this is a particular challenge for teachers, who historically, have not been taught about research methods and data analysis in their pre-service training. This leaves teachers vulnerable to “the next new thing” that policy-makers decide to introduce, and makes it hard for them to argue their corner with any confidence.

No blog post can adequately stand in for two, three, four or more years of research methods training, but I thought it might be helpful here to sign-post a few key points to de-mystify some of the research landscape for teachers.

Here’s my Top-10 questions to keep in mind when reading about new research:

1.      Where is the study published?
a.     The optimum answer to this  question is “In a peer reviewed journal”. By “peer review” we mean that the researchers sent their manuscript to an academic journal editor, the editor considered its general suitability for the journal, and then nominated a couple of academic “peers” to conduct a detailed review. This process is often conducted on a double-blind basis – i.e. the reviewer does not know the author’s identity and vice versa. However some journals use an open-review process. The distinction is not important for our purposes here. What matters is the level of scrutiny the paper receives, in terms of the theoretical logic behind its rationale, its method, data collection, analysis and interpretation.

 As any academic will attest, this can be a bruising process and we often need to don a metaphorical rhino hide before opening the email with a subject line “MS 2014XYZ Decision” or similar. Reviewers rarely spend a lot of time on the study’s strengths, highlighting instead its flaws and limitations. This is not a game for the faint-hearted.

The upside though is that most published papers have undergone considerable revision by the time they go to print, and the researchers may have had to patiently and painstakingly address a myriad of queries and challenges to their argument.

b.     But not all journals are created equal. Academics are in the know about esteem hierarchies and metrics such as impact factors. Universities are in the know about these as well, and bring considerable pressure to bear on academics to publish in high-impact journals. One problem with this is that such journals may only be read by other academics and never by practitioners “on the ground” – so while it’s gratifying to have your research cited by other researchers, it may not be translating into meaningful, real-life change.

c.       Media reports, blogs and websites often provide accessible, easy to read summaries of research, but should not be the primary source of research. If they are the primary source, you should remember that the rigorous peer-review process outlined above has almost certainly not taken place.

2.      Who are the authors?
a.     Is there a well-qualified academic on the research team? Particular knowledge of research methodology, data collection, analysis and interpretation is needed in order to conduct rigorous research. Look for evidence that this exists in the research team (e.g. a university-affiliated team leader).

b.     Are there potential or actual conflicts of interest for any of the team? Examples of this might be someone employed by a particular publishing house being part of a team that is evaluating an intervention in which that publishing house has a commercial stake. You should also look at the funding source(s) and ask yourself whether there might be vested interests in the data telling a particular story.

c.      What else has this team published? What do we know about their ideological stance / bias? (everyone has one!)

3.      What was the context of the study?
a.    What country was the study conducted in? You might read a fabulous report of a rigorous piece of research that was conducted in Uzbekistan, and be quite confident that it is tight and well-controlled.  But if there are significant differences between the Uzbeki educational context and your own, you might want to think carefully before adopting any recommended changes.

b.    What is the policy framework in which the study was conducted? Are there particular teaching approaches that are explicitly or implicitly associated with this setting?

c.     What are the demographic  characteristics of the sample? Here we need to think about socio-economic status (SES) factors, ethnicity, culture, religious influences, age and gender characteristics and any other wider influences on the context that might be relevant. The authors might tell you that the study was conducted in “10 schools with similar socio-economic characteristics”, but this doesn’t help you very much if you don’t know what those characteristics were – i.e. were the schools in a disadvantaged area, or were they middle or high-SES? This has important implications for the extent to which findings can be generalised beyond the study  – no matter how rigorous the study itself may have been.

4.      How clearly was the research question stated?
a.    Some studies are highly specific with respect to their purpose and this can be easy to see even from the title. Unfortunately, though, some research studies are a bit like fishing expeditions – the researchers pack their gear and head out into the wild to see what they can find. While it is absolutely appropriate for qualitative studies to take a broader sweep around “exploring and understanding” a phenomenon, you should always have a clear sense of what the researchers are examining and why.

5.      How adequate was the sample and the description of the intervention?
a.     Here we’re interested in issues like sample size (e.g. the number of teachers, students, schools etc) included. However there is no simple absolute answer to the question “How many is enough?” Sample size should, however be based on some kind of “power analysis” – a statistical consideration of the nature of the questions asked and the number of participants needed to test an hypothesis. If for example, you wanted to know about differences in vocabulary size between four year olds and eight year olds, we would expect that age would have a “big effect” and we would need a relatively smaller sample than if we were studying the differences in vocabulary between four year olds and four-and-a-half year olds – here there will be more developmental blurring between the two groups, and so to find an age effect (assuming one actually exists), a larger sample would be needed.

b.    Is there any potential bias/distortion due to sampling processes? An obvious issue in schools-based research is the (usual) requirement for parent/guardian consent. However it may be that parents from non-English speaking backgrounds cannot adequately understand the Information Sheet and Consent Form, and so decide (quite reasonably!) to not complete them and return them to the school. This will then introduce a systematic bias into the sample, and means findings can really only be generalised to other groups of similar composition.

c.     In studies that involve any kind of pre-post comparison (e.g. collection of baseline data at Time 1, an intervention phase, and collection of follow-up data at Time 2), it’s important to think about retention of participants over time, and most importantly to look at the characteristics of participants who were lost to follow-up. Often, these are from minority groups or have some other defining characteristic (e.g., frequent suspensions due to behaviour problems) that might in itself influence the Time 2 scores.

d.    If it is an intervention study, how were participants allocated to study arms (research Vs control?) Ideally this should occur via a process of randomisation, so that potentially confounding variables (e.g. ethnicity, IQ) are equally distributed across study arms and so are “cancelled out” in the analysis. In some medical research, it is possible to conduct “double blind” trials, in which neither the participants nor the researcher interacting with them is aware of who is in which group. This is harder to do in schools, for obvious reasons, but in general, you should look for evidence that the researchers did not influence the allocation of individuals or schools to one study arm or the other. 


e.   Also with intervention studies, ask yourself about the basis of the intervention. Does it have a theoretical rationale that draws on previous research, or is it just someone's idea about what might work? There's unfortunately been way too much of the latter in education. You should also ask whether the intervention was delivered as intended (so-called fidelity) and whether anything else might have happened during the intervention that could independently account for an apparent improvement in student performance.

6.      How suitable are the measures for the questions asked?
a.     If I told you that I was going to measure children’s IQs, and then proceeded to take out a tape measure and record their head circumferences, I think you would rightly howl me down for using an inappropriate measure of IQ. Fortunately, extremely poor choices such as this are not common, though it is common for researchers to select assessment tools that others consider to lack validity (accuracy) or reliability (consistency and trustworthiness). We also need to consider how current the measures are and whether they are widely known and well-regarded.

b.    Who conducted the assessments / measurements?  Just as we don’t want doctors interviewing their own patients about the acceptability of a new treatment, we don’t want teachers assessing their own students. Humans are prone to all sorts of conscious and unconscious bias, whether as the observer (see Rosenthal Effect) or as the observed (see Hawthorne Effect).

7.      How clearly are the results presented?
a.     Are all of the results presented, or just some of them?

b.    Often it’s necessary to have a good grasp of statistics to wade through this section of a paper, so don’t be put off if you don’t feel you bring the necessary background knowledge to the table. If necessary, consult with someone who is more confident with this territory, but persevere with other sections of the paper. Many academics would probably privately admit that they don't give this part of the paper the focus they should – which is a shame as it’s often the most difficult to write!

8.      When results are discussed, are a range of possibilities canvassed to account for the findings, or do the authors just stick with their original hypothesis?

a.     Unfortunately there is a well-known bias in what gets published, and findings that don’t sit well with researcher bias and/or the prevailing zeitgeist often just don’t see the light of day. Happily though, that is beginning to change, and academics and journal editors alike are a little more open to publishing findings that might be unexpected. Here we want to see a range of possibilities being canvassed, and the importance of future replication studies being noted. The language used should be appropriately cautious and circumspect, e.g., "These findings suggest....", or "Our results are consistent with the notion that .....".

9.      Are limitations acknowledged and addressed?
a.    All research has limitations, and most researchers are acutely aware of this when they submit a paper to a journal (if they weren't beforehand, the review process normally fixes that!). So you should expect that some limitations and their potential importance are considered (e.g.  small or biased sample, limited follow-up time).

10. Are implications for theory, practice, policy and/or further research stated?
a.      The purpose of research is to effect change  - in at least one of theory, practice, and policy. So the authors should present some ideas about the implications of their work (without over-reaching of course) and should make constructive suggestions as to how other researchers can advance the field even further.


This is by no means an exhaustive guide to critical appraisal and nor is it intended to be such. It is however, intended to guide the novice and instil some confidence that even without detailed statistical knowledge, you can still be an astute consumer of new research.

Remember too, that we rarely change tack on the basis of one study. Instead we rely on consistent trends in well-conducted research, to guide policy and practice – so look out for systematic reviews or meta-analyses, both of which pool findings on a particular question and synthesise the current state of the evidence.

I would recommend that all teachers bookmark the Macquarie University MUSEC Briefings page, as this open-access site provides reliable, independent assessments of a range of approaches that may or may not be well-supported by research evidence.


As in all things though, it is wise to remember that when something seems too good to be true ..... it probably is. 

"Trust me, I'm a researcher" is never enough.



(c) Pamela Snow 2014

No comments:

Post a Comment