Gradient of linear regression

 

Gradient
다변수 함수에 대한 일차 편미분 벡터입니다. Gradient의 방향은 함수값이 가장 크게 변하는 방향을 가리키고, gradient의 크기는 변화량의 크기를 나타냅니다.


Linear regression에서 MSE의 gradient를 구해보도록 하겠습니다.

1. $\hat{\mathbf{y}} = X \mathbf{\theta}$
\(\begin{pmatrix} \hat{y}^{(1)} \\ \vdots \\ \hat{y}^{(m)} \end{pmatrix} = \begin{pmatrix} & \mathbf{x^{(1)}}^T & \\ & \vdots & \\ & \mathbf{x^{(m)}}^T & \\ \end{pmatrix} \begin{pmatrix} \theta_0 \\ \vdots \\ \theta_n \\ \end{pmatrix}\)


2. $ MSE(\mathbf{\theta}) $

$ = \frac{1}{m}\sum_{i=1}^m(\hat{y}^{(i)} - y^{(i)})^2 = \frac{1}{m}\sum_{i=1}^m(\mathbf{x^{(i)}}^T \theta - y^{(i)})^2 $


3. $ \frac{\partial}{\partial \theta_j} MSE(\mathbf{\theta}) $

$ = \frac{2}{m} \sum_{i=1}^m(\mathbf{x^{(i)}}^T \theta - y^{(i)}) x^{(i)}_j $


4. $ \frac{\partial}{\partial \mathbf{\theta}} MSE(\mathbf{\theta}) = \nabla_{\theta}MSE(\theta) $

Let $ \frac{\partial}{\partial \theta_j} MSE(\mathbf{\theta}) = \sum_i \alpha_i x_j^{(i)} $
\(\frac{\partial}{\partial \mathbf{\theta}} MSE(\mathbf{\theta}) = \begin{pmatrix} \frac{\partial}{\partial \theta_0} MSE(\mathbf{\theta}) \\ \vdots \\ \frac{\partial}{\partial \theta_n} MSE(\mathbf{\theta}) \\ \end{pmatrix} = \begin{pmatrix} \sum_i \alpha_i x_0^{(i)} \\ \vdots \\ \sum_i \alpha_i x_n^{(i)} \\ \end{pmatrix} = \alpha_1 \begin{pmatrix} x_0^{(1)} \\ \vdots \\ x_n^{(1)} \\ \end{pmatrix} + \cdots + \alpha_1 \begin{pmatrix} x_0^{(m)} \\ \vdots \\ x_n^{(m)} \\ \end{pmatrix} = \sum_i \alpha_i \mathbf{x}^{(i)}\) \(= \frac{2}{m} \sum_{i=1}^m(\mathbf{x^{(i)}}^T \theta - y^{(i)}) \mathbf{x}^{(i)} \quad \cdots \quad \textit{linear combination of } \ \mathbf{x}^{(i)}\) \(= \frac{2}{m} \begin{pmatrix} \\ \mathbf{x}^{(1)} & \cdots & \mathbf{x}^{(m)}\\ \\ \end{pmatrix} \begin{pmatrix} \mathbf{x^{(1)}}^T \theta- y^{(1)} \\ \vdots \\ \mathbf{x^{(m)}}^T \theta- y^{(m)} \\ \end{pmatrix}\) \(= \frac{2}{m} \begin{pmatrix} \\ \mathbf{x^{(1)}} & \cdots & \mathbf{x^{(m)}}\\ \\ \end{pmatrix} ( \begin{pmatrix} & \mathbf{x^{(1)}}^T & \\ & \vdots & \\ & \mathbf{x^{(m)}}^T & \\ \end{pmatrix} \begin{pmatrix} \\ \theta\\ \\ \end{pmatrix} - \begin{pmatrix} \\ \mathbf{y}\\ \\ \end{pmatrix})\)
\(= \frac{2}{m} X^T(X\theta - \mathbf{y})\)