Confusion Matrix
The Confusion Matrix, also known as an error matrix, is a table that shows the performance of a classification predictive model's performance by comparing the predicted value of the target variable with its actual value.
Total |
Predicted 1 (= Positive Targets Predicted) |
Predicted 0 (= Negative Targets Predicted) |
Actual 1 (= Actual Positive Targets) |
Number of correctly predicted positive targets (True Positive =TP) | Number of actual positive targets that have been predicted negative (False Negative = FN) |
Actual 0 (= Actual Negative Targets) |
Number of actual negative targets that have been predicted positive (False Positive = FP) | Number of correctly predicted negative targets (True Negative = TN) |
- Positive target (Predicted 1 and Actual 1): An observation that belongs to the population you want to target.
- Negative target (Predicted 0 and Actual 0): An observation that does not belong to this target population.
The Confusion Matrix reports the number of false positive, false negative, true positive, and true negative targets. It is a good estimator of the error that would occur when applying the predictive model on new data with similar characteristics.
By default, the Total Population is the number of records in the validation data source. This is a part of your training data source that Smart Predict keeps separate from the training data, and uses to test the predictive model's performance.
The classification model allows you to sort the Total Population from the lowest to highest probability. To get the predicted category, which is what you are interested in, you need to choose the threshold that determines who or what gets into that category, and the others that don't make it. Sliding the threshold bar allows you to experiment with this number to see the resulting Confusion Matrix for the population on which you want to apply your predictive model.
- Contacted Population: You select the percentage of the population to target.
- Detected Target: You select the percentage of positive targets you want to detect.
Refer to the section How is a Decision Made For a Classification Result? for information on how Smart Predict automatically sets the threshold.
- Get a detailed assessment of your predictive model's quality. This is because it takes into account a selected threshold that transforms a range of probability scores into a predicted category. You can also use standard metrics such as specificity. For more information, see the related link.
- To estimate the expected profit, based on costs and profits associated with the predicted positive and actual positive targets. For more information, see the related link.
In a business scenario where you want to detect fraudulent credit card transactions, the False Negative (FN) class can be a better metric than the Classification rate. If your predictive model to detect the fraudulent transactions always predicts "non-fraudulent", the Classification rate will be 99.9%.
The classification rate is excellent, but it isn’t a reliable metric to evaluate the real performance of your predictive model because it gives misleading results. These results are usually due to an unbalanced data source, where there is a lot of variation in the number of samples in different classes.
This performance issue will show up in the error matrix as a high False Negative (FN) class (number actual fraudulent transactions detected as non-fraudulent by the predictive model)