Polynomial regression

비선형 데이터를 학습하는 데 선형 모델을 사용할 수 있습니다.
각 feature의 거듭제곱을 새로운 특성으로 추가하고, 이 확장된 feature를 포함한 데이터넷에 선형 모델을 훈련시키는 방법을 polynomial regression (다항 회귀)라고 합니다.

Remarks

본 포스팅은 Hands-On Machine Learning with Scikit-Learn & TensorFlow (Auérlien Géron, 박해선(역), 한빛미디어) 를 기반으로 작성되었습니다.

다음은 target function $f(x) = 0.5x^2 + x + \epsilon $ 에서 100개의 비선형 학습 데이터를 sampling하여 target function에 근사하는 선형 모델 $\hat{f}(x)$를 학습시키는 코드입니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
import random

m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)  # target function


from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)

결과 그래프

1
2
3
4
5
6
7
8
9
10
import matplotlib.pyplot as plt
%matploblib inline

X_new = np.linspace(-3, 3, 100).reshape(-1, 1)   # [100, 1]
X_new_poly = poly_features.fit_transform(X_new)  # [100, 2]
y_new = lin_reg.predict(X_new_poly)
plt.plot(X, y, 'b.')
plt.plot(X_new, y_new, 'r-')
plt.xlabel('X');  plt.ylabel('y')
plt.show()

PREVIOUSEtc