The standard deviation (SD) is a measure of how spread out a dataset is. It tells us how far items in the dataset are, on average, from the mean. So the larger the SD, the more spread out the data are.

Though the SD is an average, it is an average calculated in a somewhat unusual way – and if you don’t want the technical detail you should skip to the next paragraph now. For the dataset {2, 3, 5, 8, 12} the mean is 6. So the deviations from the mean are {–4, –3, –1, 2, 6}. We square these deviations, {16, 9, 1, 4, 36}, and take their average, 13.2. Finally we take the square root of 13.2 to give us the SD, 3.63.

One of the best ways to think about the standard deviation is in terms of some ‘rules of thumb’. Suppose we have a large dataset, or a whole population, and suppose that the distribution is the common bell-shape or Normal curve as shown in the graph above. Then the following statements are generally pretty accurate.

As an example, consider intelligence quotient or IQ (which remains popular despite the doubts of many psychologists and educationalists). IQ is usually measured on a scale with mean 100 and standard deviation 15, and the distribution of IQs in a population is, to a good approximation, Normal. We can therefore say that about 2/3 of people will have an IQ within 1 SD, that is 15 units, of the mean; so 2/3 of IQs will be between 85 and 115. About 95% of people will have IQs within 2 SDs of the mean, that is between 70 and 130. And it will be very rare to have an IQ more than 3 SDs from the mean, that is below 55 or above 145 (it’s about 1/8 of 1% in each case). The requirement to join Mensa is an IQ in the top 2% of the distribution; that amounts to an IQ just a little more than 2 SDs above the mean – about 131.