Why use Assigned Bins in a Regression Predictive Model?
Bin Number | Number of customers per bin | Description |
---|---|---|
1 | 300 customers (= 10% of the dataset) | Predicted values between 90,001 and 100,000 $ |
2 | 300 customers (= 10% of the dataset) | Predicted values between 80,001 and 90,000 $ |
3 | 300 customers (= 10% of the dataset) | Predicted values between 70,001 and 80,000 $ |
4 | 300 customers (= 10% of the dataset) | Predicted values between 60,001 and 70,000 $ |
5 | 300 customers (= 10% of the dataset) | Predicted values between 50,001 and 60,000 $ |
6 | 300 customers (= 10% of the dataset) | Predicted values between 40,001 and 50,000 $ |
7 | 300 customers (= 10% of the dataset) | Predicted values between 30,001 and 40,000 $ |
8 | 300 customers (= 10% of the dataset) | Predicted values between 20,001 and 30,000 $ |
9 | 300 customers (= 10% of the dataset) | Predicted values between 10,001 and 20,000 $ |
10 | 300 customers (= 10% of the dataset) | Predicted values between 0 and 10,000 $ |
Then, you use your predictive model to get predictions on a new set of customers. Let's say your application dataset contains observations on 800 customers.
Bin number | Number of customers per bin | Description |
---|---|---|
1 | 110 customers (~ 14% of the dataset) | Predicted values between 90,001 and 100,000 $ |
2 | 100 customers (~ 13% of the dataset) | Predicted values between 80,001 and 90,000 $ |
3 | 95 customers (~ 12% of the dataset) | Predicted values between 70,001 and 80,000 $ |
4 | 85 customers (~ 11% of the dataset) | Predicted values between 60,001 and 70,000 $ |
5 | 80 customers (~ 10% of the dataset) | Predicted values between 50,001 and 60,000 $ |
6 | 85 customers (~ 11% of the dataset) | Predicted values between 40,001 and 50,000 $ |
7 | 75 customers (~ 9% of the dataset) | Predicted values between 30,001 and 40,000 $ |
8 | 60 customers (~ 7.5% of the dataset) | Predicted values between 20,001 and 30,000 $ |
9 | 60 customers (~ 7.5% of the dataset) | Predicted values between 10,001 and 20,000 $ |
10 | 40 customers (~ 5% of the dataset) | Predicted values between 0 and 10,000 $ |
You can use Assigned Bins to monitor the population structure: As each bin should contain +/-10% of the observations, if these figures increase or decrease for one or several bins, it indicates that your population is changing and you might need to retrain your predictive model with more recent data. For example, having a look back at the example above, you can see that the distribution per bins is quite similar in the generated dataset as in the training dataset. However, we could have different results. For example, for bin 1, we could have 300 customers, which correspond to 37.5% of the dataset.