Probabilistic Estimation

1. Maximum Likelihood Estimation (ML, MLE)

1) Notations

Data
(Input, Target) : $ (x_{1..S}, \ y_{1..S}) $
Set of parameters
$\theta$
Model
$ f_\theta(x_i) $
Error
$ \epsilon_i \sim N(0, \sigma_i^2) $

2) Likelihood (function)

Definition
A function that expresses the probability of a sample of data given a set of model parameter values
Single data
$\begin{equation} \begin{aligned} L(\theta) &= p_\theta(x_i) = p(x_i | \theta) \\ &= p(y_i | f_\theta(x_i)) \end{aligned} \end{equation}$
Multiple data
$ L(\theta) = \Pi_i p(y_i | f_\theta(x_i)) $
Log likelihood
$ l(\theta) = \sum_i log[p(y_i | f_\theta(x_i))] $

3) Maximum Likelihood Estimation

Definition
A method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable
Maximum Likelihood Estimate
The point in the parameter space that maximizes the likelihood function
Examples
- Example 1. Gaussian distribution model with fixed variance
  Model $f_\theta(x_i)$ predicts Likelihood mean $E[y_i | \mu_i, \sigma^2_i] (= \mu_i$)
  
  $\begin{equation} \begin{aligned} y_i | \mu_i, \sigma^2_i \ &\sim ~ N(\mu_i, \sigma^2_i) \quad s.t. \ \color{red}{\mu_i = f_\theta(x_i)} \\ p(y_i | \mu_i, \sigma^2_i) &= (2\pi\sigma^2_i)^{-\frac{1}{2}} exp\{ -\frac{1}{2\sigma^2_i} (y_i - \mu_i)^2 \} \\ -log[p(y_i | \mu_i, \sigma^2_i)] &= \frac{1}{2} log (2\pi \sigma^2_i) + \frac{1}{2\sigma^2_i}(y_i - \mu_i)^2 \\ -log[p(y_i | \mu_i)] &∝ \frac{1}{2}(y_i - \mu_i)^2 \\ -log[p(y_i | \color{red}{f_\theta(x_i)})]&= \color{blue}{\frac{1}{2}(y_i - \color{red}{f_\theta(x_i)})^2} \end{aligned} \end{equation}$
  Maximize Likelihood → Minimize Mean Squared Error (MSE)
- Example 2. Bernoulli distribution
  Model $f_\theta(x_i)$ predicts the Probability of success $p_i$
  
  $\begin{equation} \begin{aligned} \tilde{y_i} = y_i | p_i \ &\sim ~ Ber(p_i) \quad s.t. \ \color{red}{p_i = f_\theta(x_i)} \\ p(y_i | p_i) &= p^{y_{i}}_i(1 - p_i)^{1 - y_i} \\ log[p(y_i | p_i)] &= y_i log p_i + (1 - y_i)log(1 - p_i) \\ -log[p(y_i | \color{red}{f_\theta(x_i)})] &= \color{blue}{-[y_i log \color{red}{f_\theta(x_i)} + (1 - y_i)log(1 - \color{red}{f_\theta(x_i)})]} \\ \end{aligned} \end{equation}$
  Maximize Likelihood → Minimize Cross Entropy Error (CEE)

PREVIOUSEtc