Árboles de clasificación - código¶
Importar librerías:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Importar datos:
df = pd.read_csv("Clasificación.csv", sep=";", decimal=",")
print(df.head())
X1 X2 y
0 50.24 10.06 1
1 47.71 9.16 0
2 48.10 10.18 1
3 52.77 10.24 1
4 49.48 9.57 0
Visualización de los datos:
plt.scatter(df["X1"], df["X2"], marker="^", c=df["y"], cmap=plt.cm.RdYlGn)
plt.xlabel("X1")
plt.ylabel("X2")
Text(0, 0.5, 'X2')
![../../../_images/output_6_11.png](../../../_images/output_6_11.png)
X = df[["X1", "X2"]]
print(X.head())
X1 X2
0 50.24 10.06
1 47.71 9.16
2 48.10 10.18
3 52.77 10.24
4 49.48 9.57
y = df["y"]
print(y.head())
0 1
1 0
2 1
3 1
4 0
Name: y, dtype: int64
Escalado de variables:¶
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(
X[:10,]
)
[[ 0.29938111 0.48540279]
[-1.17998259 -2.03617016]
[-0.95193838 0.82161252]
[ 1.77874481 0.98971739]
[-0.14501273 -0.88745359]
[ 0.54496718 1.69015432]
[ 0.36954856 2.41860873]
[ 1.0010556 -0.04692927]
[ 0.67945479 1.04575234]
[ 0.32277026 -1.13961089]]
Ajuste del modelo:¶
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X, y)
DecisionTreeClassifier(random_state=0)
y_pred = clf.predict(X)
print(y_pred)
[1 0 1 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1
1 0 0]
Evaluación del desempeño (performance):
from sklearn.metrics import accuracy_score
accuracy_score(y, y_pred)
1.0
Modelo sobre ajustado.
Visualización de los resultados:
from matplotlib.colors import ListedColormap
X_Set, y_Set = X, y
X1, X2 = np.meshgrid(
np.arange(start=X_Set[:, 0].min() - 1, stop=X_Set[:, 0].max() + 1, step=0.01),
np.arange(start=X_Set[:, 1].min() - 1, stop=X_Set[:, 1].max() + 1, step=0.01),
)
plt.contourf(
X1,
X2,
clf.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75,
cmap=ListedColormap(("#F0566F", "#51F192")),
)
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_Set)):
plt.scatter(
X_Set[y_Set == j, 0],
X_Set[y_Set == j, 1],
c=ListedColormap(("red", "green"))(i),
label=j,
)
plt.title("Support Vector Machine")
plt.xlabel("X1")
plt.ylabel("X2")
plt.legend()
plt.show()
c argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with x & y. Please use the color keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points. c argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with x & y. Please use the color keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points.
![../../../_images/output_22_14.png](../../../_images/output_22_14.png)
Visualización del árbol:
from sklearn import tree
feature_names = df.columns.values[0:2]
plt.figure(figsize=(15, 10))
tree.plot_tree(clf, feature_names=feature_names, filled=True);
![../../../_images/output_25_01.png](../../../_images/output_25_01.png)
Regularización¶
Cambiaremos max_depth
que por defecto es None
.
clf = DecisionTreeClassifier(max_depth=2, random_state=0)
clf.fit(X, y)
y_pred = clf.predict(X)
accuracy_score(y, y_pred)
0.95
X_Set, y_Set = X, y
X1, X2 = np.meshgrid(
np.arange(start=X_Set[:, 0].min() - 1, stop=X_Set[:, 0].max() + 1, step=0.01),
np.arange(start=X_Set[:, 1].min() - 1, stop=X_Set[:, 1].max() + 1, step=0.01),
)
plt.contourf(
X1,
X2,
clf.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75,
cmap=ListedColormap(("#F0566F", "#51F192")),
)
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_Set)):
plt.scatter(
X_Set[y_Set == j, 0],
X_Set[y_Set == j, 1],
c=ListedColormap(("red", "green"))(i),
label=j,
)
plt.title("Support Vector Machine")
plt.xlabel("X1")
plt.ylabel("X2")
plt.legend()
plt.show()
c argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with x & y. Please use the color keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points. c argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with x & y. Please use the color keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points.
![../../../_images/output_29_11.png](../../../_images/output_29_11.png)
plt.figure(figsize=(10, 5))
tree.plot_tree(clf, feature_names=feature_names, filled=True);
![../../../_images/output_30_02.png](../../../_images/output_30_02.png)