Questions come up a lot about how to interpret these things. We can build an understanding from the ground up, starting with a few basic metrics that are commonly calculated off of the data.

Measures of Central Tendency

Sometimes we want a measure of central tendency, or a good typical value that you can expect to observe given the data.


The mean is the sum of all the values divided by the number of values.


The median is the middle value (or average of the two middle values) when you order the data sequentially.


The mode is the value that occurs most frequently.

Measures of Dispersion

Sometimes a measure of central tendency is not enough. We also want to understand how much variability there is in the data around the central tendency. Enter variance.


The variance is the sum of the squared deviations from the mean. If we are dealing with a sample (as we are here). Then we divide that sum by the number of observations. This is the sample variance. Since the deviations are squared we’re not really operating on the same units at the data.

Standard Deviation

The standard deviation is the square root of the variance. By taking the square root this metric can be expressed in the same units as the data.

Standard Error

The standard error is the standard deviation of a statistic’s sampling distribution (the distribution of a statistic that behaves like a random variable, like the mean). So if it were possible to draw multiple samples from the population and compute the mean of each sample (the sample mean) then the standard deviation of all those means is the standard error of the mean. In most situations you can’t do that so you compute the standard error as the standard deviation of the sample divided by the square root of the number of observations.

This formula works because of how variance is expected to behave a with a large samples. The larger the sample you draw from the population the more close the sample mean approximates the population mean. This means that the sampling distribution will tend to cluster around the population mean. Dividing variance by the sample size gives this behavior.

Confidence Interval

A confidence interval (CI) is another metric that you can compute from a statistic. It tells you the probability that a statistic could take on a range of values if that statistic was repeatedly observed (according to some confidence level chosen by the researcher). For example, you might want to know what is the 90% confidence interval of the mean for the above data. In other words, if we could observe the sampling distribution of the mean, where would 90% of those means lie. You know the mean $\bar{x}$ and the standard error of of the mean $\text{SE}(\bar{x})=\frac{\sigma}{\sqrt{n}}$. From the standard normal distribution quantile function you know that 90% of the distribution is contained between -1.64 and +1.64. With that you can find your confidence interval bounds by computing,

$$ \text{CI}_{\text{lower}} = \bar{x}-1.64\cdot SE(x) $$

$$ \text{CI}_{\text{upper}} = \bar{x}+1.64\cdot SE(x) $$

That’s the easy way (there’s no context as to how/why we come to that formula). The hard way is understanding why you apply that formula in situations where the data is (approximately) normally distributed. The method shown here to derive confidence intervals is the pivotal quantity method.

You start by computing your statistic using the data. In this example we’re computing the mean $\bar{x}$. This has a sampling distribution $\bar{x} \sim \mathcal{N}(\mu,\sigma^2)$ since it’s a statistic that is a random variable. But we can’t do inference on that statistic; its sampling distribution has unknown parameters. One thing we can do is transform the statistic to be distributed standard normal so that it is a pivotal quantity (the probability distribution does not depend on the unknown parameters).

$$Z = \frac{\bar{x}-\mu}{\sigma^2/n} \sim \mathcal{N}(0,1)$$

We can do inference on $Z$ using the standard normal probability density function. Specifically, want to find the confidence interval bounds such that,

$$P(-Z_{\alpha/2} \leq Z \leq Z_{\alpha/2}) = 1-\alpha$$

where $\alpha\in(0,1)$ is a chosen confidence level. $Z_{\alpha/2}$ is calculated from the standard normal quantile function evaluated at $\frac{\alpha}{2}$, and tells us the value such that $\frac{\alpha}{2}$ proportion of the distribution lies below this value. Finally, $1-\alpha$ is how often we would like to observe $Z$ between $-Z_{\alpha/2}$ and $Z_{\alpha/2}$. So if $\alpha = 0.05$ then $\text{CDF}^{-1}(0.05/2) \approx -1.96$, which means $1-\alpha=0.90$ proportion (90%) of the data lies between -1.96 and 1.96 under the standard normal distribution.

But we don’t really care about the probability of observing the standard normal variable $Z$. We’re interested in $\mu$. So we can take the confidence interval equation above, substitute the pivotal quantity, and solve for the probability of observing $\mu$.

$$P(-Z_{\alpha/2} \leq Z \leq Z_{\alpha/2}) = 1-\alpha$$

$$P(-Z_{\alpha/2} \leq \frac{\bar{x}-\mu}{\sigma^2/n} \leq Z_{\alpha/2}) = 1-\alpha$$

$$P(-\bar{x} -Z_{\alpha/2} \cdot \frac{\sigma^2}{n} \leq -\mu \leq -\bar{x} + Z_{\alpha/2} \cdot \frac{\sigma^2}{n}) = 1-\alpha$$

$$P(\bar{x} + Z_{\alpha/2} \cdot \frac{\sigma^2}{n} \geq \mu \geq \bar{x} - Z_{\alpha/2} \cdot \frac{\sigma^2}{n}) = 1-\alpha$$

$$P(\bar{x} - Z_{\alpha/2} \cdot \frac{\sigma^2}{n} \leq \mu \leq \bar{x} + Z_{\alpha/2} \cdot \frac{\sigma^2}{n}) = 1-\alpha$$

This maps back to our quick formula at the beginning of this section. The derivation shows how a confidence interval is not simply the probability that $\bar{x}$ is between a range. It is the probability that $\bar{x}$ takes on a value between a range under the sampling distribution of $\bar{x}$. A confidence interval describes the bounds around a statistic, such that a realization of a value from the sampling distribution of your statistic is within those bounds with some probability (a probability that you define).

The short of it is that this is a pretty fucked up way to do inference, and it’s not surprising how often confidence intervals get interpreted incorrectly.