Bessel's Correction

Bessel's Correction
Photo by Claudio Schwarz / Unsplash

Short proof of why the maximum likelihood estimator (MLE) of the variance is a biased estimator.

When we measure the variation of our random variable (r.v.) with respect to the sample mean, we are acting as if there was an additional sample in the average, thus removing a degree of freedom and skewing the result towards less variance.

Conversely, the MLE of the variance with respect to the true population mean is unbiased. This can be seen in the factor n-1/n which tends to 1 as the sample mean reaches the true population mean.

In other words:
σMVU2=nn1σML2\sigma_{MVU}^2 = \frac{n}{n-1} \sigma_{ML}^2
=1n1i=1n(xiμML)2= \frac{1}{n-1} \sum_{i=1}^n (x_i - \mu_{ML})^2

In other words:
$$ \sigma_{MVU}^2 = \frac{n}{n-1} \sigma_{ML}^2 $$
$$ = \frac{1}{n-1} \sum_{i=1}^n (x_i - \mu_{ML})^2$$


Full proof:

$$ \mathbb{E}[\sigma_{ML}^2] = \mathbb{E}[\frac{1}{n} \sum_{i=1}^n(x_i - \mu_{ML})^2] $$

$$ = \color{red}\mathbb{E}[\sum_{i=1}^n x_i^2] \color{black} -
\color{blue}2 \mathbb{E}[x_i \mu_{ML}]\color{black} +
\color{green}\mathbb{E}[\mu_{ML}^2] \color{black}$$

$$ = \color{red} \frac{1}{n} \sum_{i=1}^n \mathbb{E}[x_i^2] \color{black} -
\color{blue} \frac{2}{n} \mathbb{E}[x_i \frac{1}{n} \sum_{j=1}^n x_j] \color{black} +
\color{green} \frac{1}{n^2} \mathbb{E}[(\sum_{i=1}^n x_i)^2]\color{black}$$

$$ = \color{red} \frac{1}{n} \sum_{i=1}^n (\sigma^2 + \mu^2) \color{black} -
\color{blue}\frac{2}{n} \mathbb{E}[x_i^2 + x_i \sum_{j \neq i} x_j] \color{black} +
\color{green}\frac{1}{n2}\mathbb{E}[\sum_{i=1}n x_i^2 + \sum_{i \neq j} x_i x_j]\color{black}$$

$$ = \color{red} \sigma^2 + \mu^2 \color{black} -
\color{blue} 2(\frac{1}{n}(\sigma^2 + \mu^2) + \frac{n-1}{n} \mu^2)\color{black} +
\color{green}\frac{n}{n^2} (\mu^2 + \sigma^2) + \frac{n^2 - n}{n^2} \mu^2 \color{black} $$

$$ = \frac{n-1}{n} \sigma^2 $$

Given that:
$$
\begin{cases}
\mathbb{E}[x_i x_j] = \mu^2 + \sigma^2 & \text{if $i=j$}a
\mathbb{E}[x_i x_j] = \mu^2 & if \ i \neq j
\end{cases}
$$

Read more

Reinforcement Learning for Optimizing Compute Clusters

Reinforcement Learning for Optimizing Compute Clusters

Main Contributions * Theoretical generalization of Q-learning to multi-objective environments * Implementation of a proof-of-concept on synthetic examples * Dashboard visualization of performance over time Preview Abstract Hyperparameter tuning of complex systems is a problem that relies on many decision-making algorithms and hand-crafted heuristics. Developers are often called upon to tune these hyperparameters

By Yann HOFFMANN