Probabilistic Estimation

 

1. Maximum Likelihood Estimation (ML, MLE)

1) Notations

  • Data
    (Input, Target) : $ (x_{1..S}, \ y_{1..S}) $

  • Set of parameters
    $\theta$

  • Model
    $ f_\theta(x_i) $

  • Error
    $ \epsilon_i \sim N(0, \sigma_i^2) $

2) Likelihood (function)

  • Definition
    A function that expresses the probability of a sample of data given a set of model parameter values

  • Single data
    \(\begin{equation} \begin{aligned} L(\theta) &= p_\theta(x_i) = p(x_i | \theta) \\ &= p(y_i | f_\theta(x_i)) \end{aligned} \end{equation}\)

  • Multiple data
    $ L(\theta) = \Pi_i p(y_i | f_\theta(x_i)) $

  • Log likelihood
    $ l(\theta) = \sum_i log[p(y_i | f_\theta(x_i))] $

3) Maximum Likelihood Estimation

  • Definition
    A method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable

  • Maximum Likelihood Estimate
    The point in the parameter space that maximizes the likelihood function

  • Examples

    • Example 1. Gaussian distribution model with fixed variance
      Model $f_\theta(x_i)$ predicts Likelihood mean $E[y_i | \mu_i, \sigma^2_i] (= \mu_i$)

      \(\begin{equation} \begin{aligned} y_i | \mu_i, \sigma^2_i \ &\sim ~ N(\mu_i, \sigma^2_i) \quad s.t. \ \color{red}{\mu_i = f_\theta(x_i)} \\ p(y_i | \mu_i, \sigma^2_i) &= (2\pi\sigma^2_i)^{-\frac{1}{2}} exp\{ -\frac{1}{2\sigma^2_i} (y_i - \mu_i)^2 \} \\ -log[p(y_i | \mu_i, \sigma^2_i)] &= \frac{1}{2} log (2\pi \sigma^2_i) + \frac{1}{2\sigma^2_i}(y_i - \mu_i)^2 \\ -log[p(y_i | \mu_i)] &∝ \frac{1}{2}(y_i - \mu_i)^2 \\ -log[p(y_i | \color{red}{f_\theta(x_i)})]&= \color{blue}{\frac{1}{2}(y_i - \color{red}{f_\theta(x_i)})^2} \end{aligned} \end{equation}\)
      Maximize Likelihood → Minimize Mean Squared Error (MSE)

    • Example 2. Bernoulli distribution
      Model $f_\theta(x_i)$ predicts the Probability of success $p_i$

      \(\begin{equation} \begin{aligned} \tilde{y_i} = y_i | p_i \ &\sim ~ Ber(p_i) \quad s.t. \ \color{red}{p_i = f_\theta(x_i)} \\ p(y_i | p_i) &= p^{y_{i}}_i(1 - p_i)^{1 - y_i} \\ log[p(y_i | p_i)] &= y_i log p_i + (1 - y_i)log(1 - p_i) \\ -log[p(y_i | \color{red}{f_\theta(x_i)})] &= \color{blue}{-[y_i log \color{red}{f_\theta(x_i)} + (1 - y_i)log(1 - \color{red}{f_\theta(x_i)})]} \\ \end{aligned} \end{equation}\)
      Maximize Likelihood → Minimize Cross Entropy Error (CEE)