Think back…when listening to debates, how often have you heard people state that “x is linked to y”?. ‘y’, for example, could be cancer or the economic slump and ‘x’ anything from pollution levels to bacon consumption, low confidence to the weather.  In saying that the two are linked, they are really only referring to an association, a statistical pattern, between them. But the implication, sometimes implicit, sometimes explicit, is that ‘x’ causes ‘y’.

But where is the evidence? The job of statistical tests is to tell us whether correlation between two measurable things (what statisticians term ‘variables’) is down to coincidence or otherwise significant. But even when there seems to be strong correlation, this still does not prove causation.

The well-worn phrase “correlation does not prove causation” is, in itself, correct but when used as ammunition in debate, usually marks the end of informed discussion. Often you get a strong sense at this point that nobody is sure who is bluffing and who really knows more.  That’s because, more often than not, the entire discussion is based on correlational data and it’s a big leap from finding that there is an association between two things to knowing that one actually causes the other.

Bringing ‘correlation’ into a debate is pointless unless you know something about the strength (how close) and the direction (positive or negative) of the relationship between the two things you are discussing.

In short, you –  and whoever you are debating with –  need to know:

a) whether there is positive correlation (i.e. as one thing increases, so does the other) or negative correlation (as one increases, the other decreases)

b) the correlation coefficient (symbolised as ‘r’ – a figure which will always be between -1.0 and +1.0) which tells you how close the association is between the two things (when there is a causal link, this measure represents information which can begin to help you to begin to predict future interventions… a very simple example would be that of an electricity plant planning for higher outputs during a cold spell based on the correlation between high demand for electricity and cold weather)

c) the regression coefficient, Again, providing there is a causal relationship between two things, this measurable unit explains how closely the two things are connected and how much one thing will change if the other changes

It’s also useful to know the effect size eg  there might be a strong correlation between two things but, in practice, the actual number of imports-exports or sick people etc involved -according to the debate - might be very small. But then again, they might still be very important and carry huge impact.

Our advice?  Just as statisticians don’t do one-off ‘yes/no’ tests, it’s your duty (!) to keep probing, building up your evidence incrementally to get nearer and nearer to the truth of the matter. Above all, don’t crumble when you hear the terms ‘correlation’ and ‘causation’ in the same sentence, be ready ask some or all of the following questions: