90% of the T Distribution

William Sealy Gosset was great. He improved beer at Guinness by using the statistics that existed at the time. Not happy with that, he invented new statistics to brew even better beer. The things he invented are used all over the place now, but Guinness wanted to keep him a secret weapon, so they made him publish his results under the fake name Student.

One thing Gosset realised is that it is wrong to compute 90 % confidence intervals for the mean by taking the standard deviation of the sample, and assume a normal distribution, like-a-so:

\[\hat{\mu} \pm 1.645 \hat{\sigma}\]

When we do this we get too narrow a range, because while we recognise \(\hat{\mu}\) is just an approximation, we are assuming we know \(\sigma = \hat{\sigma}\) with certainty!

Gosset came up with correction tables based on the number of samples used in the estimation of the confidence interval, to account for our uncertainty in the estimation of \(\hat{\sigma}\). Here are some useful values, rounded to be easier to memorise:

Number of samples Correction factor for 90 % interval 2 4× 3 2× 4 1.5× 5 1.3× 6–8 1.2× 9–20 1.1×

To use this table, count how many samples the estimation of the standard deviation is based on, multiply the estimation of the standard deviation \(\hat{\sigma}\) with the correction factor, and then multiply again with 1.645 to get a 90 % interval. If the number of samples is greater than 20, the naïve estimation of the standard deviation is good enough for a 90 % interval.

Thus, if we have 7 samples and these have lead us to estimate a mean of 32 minutes with a standard deviation of 8 minutes, we should not think of the 90 % confidence interval as

\[ 32 \pm 8×1.645\]

but rather as

... continue reading