1. Maximum Likelihood Estimation (ML, MLE)
1) Notations
-
Data
(Input, Target) : $ (x_{1..S}, \ y_{1..S}) $ -
Set of parameters
$\theta$ -
Model
$ f_\theta(x_i) $ -
Error
$ \epsilon_i \sim N(0, \sigma_i^2) $
2) Likelihood (function)
-
Definition
A function that expresses the probability of a sample of data given a set of model parameter values -
Single data
\(\begin{equation} \begin{aligned} L(\theta) &= p_\theta(x_i) = p(x_i | \theta) \\ &= p(y_i | f_\theta(x_i)) \end{aligned} \end{equation}\) -
Multiple data
$ L(\theta) = \Pi_i p(y_i | f_\theta(x_i)) $ -
Log likelihood
$ l(\theta) = \sum_i log[p(y_i | f_\theta(x_i))] $
3) Maximum Likelihood Estimation
-
Definition
A method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable -
Maximum Likelihood Estimate
The point in the parameter space that maximizes the likelihood function -
Examples
-
Example 1. Gaussian distribution model with fixed variance
Model $f_\theta(x_i)$ predicts Likelihood mean $E[y_i | \mu_i, \sigma^2_i] (= \mu_i$)
\(\begin{equation} \begin{aligned} y_i | \mu_i, \sigma^2_i \ &\sim ~ N(\mu_i, \sigma^2_i) \quad s.t. \ \color{red}{\mu_i = f_\theta(x_i)} \\ p(y_i | \mu_i, \sigma^2_i) &= (2\pi\sigma^2_i)^{-\frac{1}{2}} exp\{ -\frac{1}{2\sigma^2_i} (y_i - \mu_i)^2 \} \\ -log[p(y_i | \mu_i, \sigma^2_i)] &= \frac{1}{2} log (2\pi \sigma^2_i) + \frac{1}{2\sigma^2_i}(y_i - \mu_i)^2 \\ -log[p(y_i | \mu_i)] &∝ \frac{1}{2}(y_i - \mu_i)^2 \\ -log[p(y_i | \color{red}{f_\theta(x_i)})]&= \color{blue}{\frac{1}{2}(y_i - \color{red}{f_\theta(x_i)})^2} \end{aligned} \end{equation}\)
Maximize Likelihood → Minimize Mean Squared Error (MSE) -
Example 2. Bernoulli distribution
Model $f_\theta(x_i)$ predicts the Probability of success $p_i$
\(\begin{equation} \begin{aligned} \tilde{y_i} = y_i | p_i \ &\sim ~ Ber(p_i) \quad s.t. \ \color{red}{p_i = f_\theta(x_i)} \\ p(y_i | p_i) &= p^{y_{i}}_i(1 - p_i)^{1 - y_i} \\ log[p(y_i | p_i)] &= y_i log p_i + (1 - y_i)log(1 - p_i) \\ -log[p(y_i | \color{red}{f_\theta(x_i)})] &= \color{blue}{-[y_i log \color{red}{f_\theta(x_i)} + (1 - y_i)log(1 - \color{red}{f_\theta(x_i)})]} \\ \end{aligned} \end{equation}\)
Maximize Likelihood → Minimize Cross Entropy Error (CEE)
-