Prof Tim Dodwell

by Prof Tim Dodwell

Lesson

Machine Learning Workflow

13. Is My Model any Good - Validation Plots

Once you have trained a model, and there is comfort that it has "fitted well" I would say the best thing to look at is understanding how the ML model is performing on a task. How good a machine learning model depends on the particular applied questions you are being asked. For example, if it is a recommender system for recommending the color of underpants or a recommender system for air traffic control, the bounds around good are understandable very different.

A first port of call when looking at how good your model is are some simple plots. There are various plots you can look at. Here are a few, which are widely used in classification and regression problems.

Confusion Matrix - Multiclass classification


First one up is the confusion matrix. A confusion matrix is a table that is often used to evaluate the performance of a classification model.

academy.digilab.co.uk

The table contains information about the model's predictions on a set of test data, as compared to the true labels for that data. Here is the informaiton on a confusion matrix:

  • The confusion matrix is usually represented as a square matrix, with rows corresponding to the true class labels and columns corresponding to the predicted class labels.

  • The sum of all the numbers in the table equals the total number of test samples.

  • The overall accuracy of the model can be determined by the sum of the diagonal (all correct predicts) divided by the total number of test samples.

  • There is no reason for the confusion matrix to be symmetric.

We can also evaluate metrics for a particular "LABEL". So let's define the four cases we might have

  1. True Positive (TP): This represents the number of cases where the actual class is "LABEL", and the model correctly predicts it as "LABEL".

  2. False Positive (FP): This represents the number of cases where the actual class not the "LABEL", but the model incorrectly predicts the "LABEL".

  3. True Negative (TN): This represents the number of cases where the actual class is not the "LABEL", and the model correctly predicts it not the "LABEL".

  4. False Negative (FN): This represents the number of cases where the actual class is the "LABEL", but the model incorrectly predicts it as not the "LABEL".

Once we have each of these classes we can calculate various metrics of the algorithms ability to classify "LABEL". Let's show these in a reduce confusion matrix.

Here is a picture which unpicks each of these

academy.digilab.co.uk

ROC and Area under ROC - Binary Classifiers

The ROC (Receiver Operating Characteristic) curve is a graphical representation of the performance of a binary classification algorithm. It is used to evaluate the trade-off between the true positive rate (TPR) and the false positive rate (FPR) for different classification thresholds.

In the context of binary classification, the ROC curve plots the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis, for different threshold values. The TPR, also known as sensitivity, is the proportion of actual positive cases that are correctly identified as positive by the model. The FPR, also known as the fall-out, is the proportion of actual negative cases that are incorrectly identified as positive by the model.

academy.digilab.co.uk

An ideal classifier would have a ROC curve that passes through the upper left corner of the plot, where the TPR is 1 (i.e., all positive cases are correctly classified), and the FPR is 0 (i.e., no negative cases are incorrectly classified as positive). In contrast, a random classifier would have a ROC curve that follows the diagonal line of the plot.

The area under the ROC curve (AUC) is a commonly used metric for evaluating the overall performance of a binary classification algorithm. AUC ranges between 0 and 1, with a value of 1 indicating a perfect classifier, and a value of 0.5 indicating a random classifier. An AUC value of less than 0.5 indicates that the classifier is worse than random, and an AUC value of greater than 0.5 indicates that the classifier is better than random.

Overall, the ROC curve is a useful tool for evaluating the performance of a binary classification algorithm, particularly in situations where the cost of false positives and false negatives may differ.

True Vs. Predicted Plot - Regression

Visualising the performance of regression model can be quite simple. The natural plot is a simple plot of true value (yy) and predicted values (y^\hat y) over the testing set.

So something like this.

academy.digilab.co.uk

In this case, with the "perfect" model we would see all text points line up on the diagonal where y=y^y = \hat{y}.

Variation from this line represents contributions to the loss function

j=(yjy^j)2\ell_j = (y_j - \hat{y}_j)^2

If there are clear outliers, it is often to identify and consider those particular cases.