What is an effect size?

An effect size is a quantitative measure of the difference between two groups. In systematic reviews and meta-analyses of interventions, effect sizes are calculated based on the ‘standardised mean difference’ (SMD) between two groups in a trial – very roughly, this is the difference between the average score of participants in the intervention group, and the average score of participants in the control group).


Effect sizes are usually reported using the label ‘d=’, and in the form of a fraction, such as d=0.2 or d=0.5. One of the most common ways of interpreting effect sizes is based on the work of a man named Cohen, who said that: 0.2 and below = small effect size; 0.5 = medium effect size; 0.8 and above = large effect size. While these interpretations are not uncontroversial and there are other ways to calculate and interpret effect size, Cohen’s suggestions are generally accepted and are a good basis for interpreting the results of trials and in reading systematic reviews and meta-analyses.


You might also see studies that report ‘statistical significance’, which tells you if an intervention had an effect that was unlikely to have happened by chance. While it is important to know this, it is not as useful for comparing effect sizes of multiple studies, as we do in systematic reviews. This is because statistical significance does not take into account sample size (i.e. the number of participants in a study). If two studies are identical except that one has a larger sample size, we would usually consider the study with the larger sample size to be more reliable, but statistical significance does not give more weight to a study with more participants – all studies are treated equally.

Effect sizes, on the other hand, are ‘weighted’ according to the number of participants in a study. For instance, a study with 10 participants might have had a big effect size (such as 0.8); while another study of the same intervention may have had 1000 participants but a small effect size (such as 0.2). If all other things are equal (e.g. both studies had a low risk of bias), then both studies may have shown that the intervention had a statistically significant effect, but the overall effect size would be small, because the larger of the two studies would be given more ‘weight’.