Logging Experiences

Sample Variance vs. Population Variance: Bessel’s Correction

Posted in Math, Statistics by Sina Iravanian on August 21, 2011

Consider that you have a database of $N$ items. This database forms the whole population of the statistical operations that comes. If you calculate the mean, variance, and standard deviation of these items, then you are actually computing the population mean ($\mu$), the population variance ($\sigma^2$), and the population standard deviation ($\sigma$).

But if you draw some random samples out of the population, then you are actually sampling the population, and estimating the true statistics using those samples (maybe because it is expensive to do the calculations for the whole population). Statisticians usually use different names and notations for the values calculated from samples, e.g., the sample mean ($\bar{x}$).

The sample variance which is calculated using the same formula of calculating population variance is biased towards the sample items. More formally its expected value does not equal the population variance:

$\mathbb{E}[\sigma^2_{sample}] \neq \sigma^2$

To solve this problem, the sample variance is corrected by multiplying it by $\frac{n}{n-1}$ or simply using $n-1$ instead of $n$ when calculating the mean of squared deviations, i.e.:

$s^2 = \frac{1}{n-1} \, \sum_{i = 1}^{n} (x_i - \bar{x})^2$

This value is called the unbiased sample variance ($s^2$), for it is proved that [+]:

$\mathbb{E}[s^2] = \sigma^2$

To have different notations, the biased sample variance is shown by $s_n^2$.

Using $n - 1$ instead of $n$ in the formula for variance is called Bessel’s correction.

Posted in Math, Statistics by Sina Iravanian on August 21, 2011

Expected value of a continuous random variable is given by:

$\mathbb{E}[X] = \int_{-\infty}^{+\infty} x\,f(x)\,dx$

where $f$ is the probability density function of the random variable $x$. Now the question is how do we calculate $\mathbb{E}[g(X)]$, e.g., $\mathbb{E}[X^2]$? Do we know $f(g(x))$ for $x \in X$? The answer is that we don’t need to. No matter what we do with $x \in X$, by applying $g$ to it, we have:

$f(g(x)) = f(x)$

therefore:

$\mathbb{E}[g(X)] = \int_{-\infty}^{+\infty} g(x)\,f(x)\,dx$.

Tagged with: , ,