SVM - Regresión - código

Importar librerías:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Importar datos:

df = pd.read_csv("regresion.csv", sep=";", decimal=",")
print(df.head())
      X     y
0   9.0  44.7
1  10.1  78.0
2  11.6  83.0
3   9.1  80.0
4   9.7  77.0

Visualización de los datos:

plt.scatter(df["X"], df["y"])
plt.xlabel("X")
plt.ylabel("y")
Text(0, 0.5, 'y')
../../../_images/output_6_1.png

Ajuste del modelo:

X = df[["X"]]
print(X.head())
      X
0   9.0
1  10.1
2  11.6
3   9.1
4   9.7
y = df["y"]
print(y.head())
0    44.7
1    78.0
2    83.0
3    80.0
4    77.0
Name: y, dtype: float64

Regresión lineal:

from sklearn.svm import SVR
svm_reg = SVR(kernel="linear")
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)

Evaluación del desempeño:

from sklearn.metrics import r2_score, mean_squared_error
r2_score(y, y_pred)
0.025315275959217898
mean_squared_error(y, y_pred)
1030.9657295221316
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd413c8820>
../../../_images/output_18_11.png

Regresión no lineal:

Kernel: RBF:

svm_reg = SVR(kernel="rbf")
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
0.33146083387157166
mean_squared_error(y, y_pred)
707.1424760452836
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd4143cb80>
../../../_images/output_25_12.png

Kernel: Polinómica:

svm_reg = SVR(kernel="poly", degree=4)
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
-0.11184886055676602
mean_squared_error(y, y_pred)
1176.0501045816075
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd414aea60>
../../../_images/output_31_12.png

Regularización del modelo:

El mejor modelo fue el del kernel RBF.

Se cambiará el hiperparámetro: epsilon

svm_reg = SVR(kernel="rbf", epsilon=0.5)
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
0.3331817646255232
mean_squared_error(y, y_pred)
705.3221739656027
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd41521640>
../../../_images/output_38_11.png

Se cambiará el hiperparámetro: gamma

svm_reg = SVR(kernel="rbf", gamma=0.5)
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
0.30117215778379613
mean_squared_error(y, y_pred)
739.1801044895215
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd41591820>
../../../_images/output_44_1.png

Se cambiará el hiperparámetro: C

svm_reg = SVR(kernel="rbf", C=50)
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
0.5592769445039655
mean_squared_error(y, y_pred)
466.1716298815007
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd416044c0>
../../../_images/output_50_1.png

Se cambiarán los hiperparámetros: epsilon, gamma y C.

svm_reg = SVR(kernel="rbf", epsilon=0.5, gamma=0.7, C=50)
svm_reg.fit(X, y)
y_pred = svm_reg.predict(X)
r2_score(y, y_pred)
0.5655477014368502
mean_squared_error(y, y_pred)
459.5387820113037
plt.scatter(X, y)
plt.scatter(X.values, y_pred, color="darkred")
<matplotlib.collections.PathCollection at 0x1cd41666e50>
../../../_images/output_56_11.png