Consider that you have a database of items. This database forms the whole population of the statistical operations that comes. If you calculate the mean, variance, and standard deviation of these items, then you are actually computing the population mean (), the population variance (), and the population standard deviation ().
But if you draw some random samples out of the population, then you are actually sampling the population, and estimating the true statistics using those samples (maybe because it is expensive to do the calculations for the whole population). Statisticians usually use different names and notations for the values calculated from samples, e.g., the sample mean ().
The sample variance which is calculated using the same formula of calculating population variance is biased towards the sample items. More formally its expected value does not equal the population variance:
To solve this problem, the sample variance is corrected by multiplying it by or simply using instead of when calculating the mean of squared deviations, i.e.:
This value is called the unbiased sample variance (), for it is proved that [+]:
To have different notations, the biased sample variance is shown by .
Using instead of in the formula for variance is called Bessel’s correction.
Expected value of a continuous random variable is given by:
where is the probability density function of the random variable . Now the question is how do we calculate , e.g., ? Do we know for ? The answer is that we don’t need to. No matter what we do with , by applying to it, we have: