Why use Assigned Bins in a Classification Predictive Model?

Example

Let's take the following example: you want to know if customers will buy your new product "P". You train your predictive model using a training dataset containing past observed observations for 1,000 customers. As a result, Smart Predict has ranged your observations as follows:

Bin number	Number of customers in the bin	Average probability to buy "P"
1	100 customers (= 10% of the dataset)	20%
2	100 customers (= 10% of the dataset)	18%
3	100 customers (= 10% of the dataset)	15%
4	100 customers (= 10% of the dataset)	13%
5	100 customers (= 10% of the dataset)	11%
6	100 customers (= 10% of the dataset)	8%
7	100 customers (= 10% of the dataset)	7%
8	100 customers (= 10% of the dataset)	4%
9	100 customers (= 10% of the dataset)	3%
10	100 customers (= 10% of the dataset)	1%

Then, you use your predictive model to get predictions on a new set of customers. Let's say your application dataset contains observations on 700 customers.

Smart Predict will give you the following result in the generated dataset:

Bin number	Number of customers in the bin	Estimation of the probability to buy "P"
1	200 customers (~ 29% of the dataset)	20%
2	100 customers (~ 14%)	18%
3	43 customers (~ 6%)	15%
4	27 customers (~ 4%)	13%
5	80 customers (~ 11%)	11%
6	45 customers (~ 6%)	8%
7	50 customers (~ 7%)	7%
8	35 customers (~ 5%)	4%
9	32 customers (~ 5%)	3%
10	88 customers (~13%)	1%

You can use Assigned Bins for two purposes:

Use for assigned bins	Description	Example
Simulating/Estimating the number of positives cases.	At the training step, Smart Predict has assigned each observation to a bin (one bin equals 10% of the dataset), which corresponds to a probability to be a positive case. Smart Predict associates to each customer his/her probability to buy the product P and check if this probability makes the customer belongs to bin 1, 2, 3, etc. by referring to the bins defined in the training step. As each bin is associated to an average percentage of positive cases, you can easily estimate the number of positive cases. Note It can happen that the distribution of the observations is not similar (10% of observations in each bin). It's not because the structure of the population has changed that the predictive model is not relevant anymore (see next point).	Example Let's have a look at our example above. At the training step, you know the actual number of positive targets by bins as you train your predictive model on known data. At the application step, you don't know that. But once the predictive model is applied, you know for each customer of the application dataset to which bin it belongs to. You can therefore estimate the total of customers who would buy "P".
Monitoring the population structure	Dividing the dataset into bins means that each bin should contain +/-10% of the observations. However, if this changes, then it indicates that your population is changing. For example, there could be an effect that advertising on social media sites might influence and attract more young customers, rather than other age groups. It doesn't mean that the predictive model is not efficient anymore. But it may be an alert to check this performance with more data from the recent past (than the ones used to train the model).	Example Having a look back at the example above, you can see that the distribution per bin in the generated dataset is not similar as in the training dataset. For example, for bin 1, we have 200 customers, which correspond to 28% of the dataset. It could simply be because you have more young customers, but with the same buying behaviour as young customers in the training population.
Monitoring the predictive model performance	Once the predictive model has been applied, it is easier to analyze the classification performance by bins, rather than interpreting the performance curve. Use the classification rate (see The Metrics) calculated at the training step associated with each bin, and detect any variation of this rate when applying your predictive model.