Model Performance Measurement
Unit 11.1
Confusion Matrix
Confusion matrices help evaluate classification models by showing the true positives, false positives, true negatives, and false negatives.
The following example creates a confusion matrix between two vectors, simulating predicted and actual values:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
(tn, fp, fn, tp)
- True Negatives (TN): Cases where the model correctly predicted 0. (In this case 0.)
- False Positives (FP): Cases where the model incorrectly predicted 1 when it was actually 0. (In this case 2.)
- False Negatives (FN): Cases where the model incorrectly predicted 0 when it was actually 1. (In this case 1.)
- True Positives (TP): Cases where the model correctly predicted 1. (In this case 1.)
For real-world scenarios, confusion matrices provide insights into classification errors, as shown below:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = SVC(random_state=0)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
cm = confusion_matrix(y_test, predictions, labels=clf.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=clf.classes_)
disp.plot()
plt.show()
F1, Accuracy, Recall, AUC, and Precision Scores
F1 Score
The F1 score is the harmonic mean of precision and recall, balancing their trade-offs. It’s particularly useful for imbalanced datasets.
from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(f"Macro F1: {f1_score(y_true, y_pred, average='macro')}")
print(f"Micro F1: {f1_score(y_true, y_pred, average='micro')}")
print(f"Weighted F1: {f1_score(y_true, y_pred, average='weighted')}")
print(f"No Average F1: {f1_score(y_true, y_pred, average=None)}")
Macro F1: 0.26666666666666666
Micro F1: 0.3333333333333333
Weighted F1: 0.26666666666666666
No Average F1: [0.8 0. 0. ]
Accuracy
Accuracy is the ratio of correctly predicted observations to total observations.
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
2/4 = 0.5
Precision
Precision is the ratio of true positives to the sum of true positives and false positives. So even if we have a low accuracy, precision que be maximal as in the following example:
from sklearn.metrics import precision_score
y_pred = [0, 1, 0, 0]
y_true = [0, 1, 1, 1]
precision_score(y_true, y_pred)
1/1 = 1
If in a multiclass scenario, the average is taken.
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
precision_score(y_true, y_pred, average='macro')
0.2222222222222222
Recall
Recall is the ratio of true positives to the sum of true positives and false negatives.
from sklearn.metrics import recall_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
recall_score(y_true, y_pred, average='macro')
0.3333333333333333
Classification Report
Provides a comprehensive overview of precision, recall, F1 score, and support.
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
accuracy 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5
AUC (Area Under the Curve)
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
X, y = load_breast_cancer(return_X_y=True)
clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y)
roc_auc_score(y, clf.predict_proba(X)[:, 1])
Regression Metrics
RMSE (Root Mean Square Error)
Evaluates the standard deviation of prediction errors.
from sklearn.metrics import mean_squared_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_squared_error(y_true, y_pred)
MAE (Mean Absolute Error)
Measures average magnitude of errors in predictions.
from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)
R Squared (Coefficient of Determination)
Explains the proportion of variance in the dependent variable that is predictable.
from sklearn.metrics import r2_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
r2_score(y_true, y_pred)