Generating and Saving the Predictions for a Classification or Regression Predictive Model
Context
You want to generate and save the predictions for a predictive model of type classification or regression.
Procedure
- Open the relevant predictive model.
-
Click Apply Predictive Model
.
The Apply Predictive Model window opens.
- In the Apply To Population section, select the application you want to apply your predictive model on. Don't forget that this dataset must be prepared beforehand, it cannot be created at this step.
-
In the Generated Dataset section, you select the additional columns you want to have in your generated
dataset:
- Replicated Column: select which columns from the training data source that should replicated in the
generated dataset.Restriction
If your application dataset contains more columns than your training dataset, the additional columns will be ignrored by the application process.
- Statistics & Predictions: This is information about your predictive model that you want to have
in the generated dataset.
Information Description Comments Apply Date It's the start date of the predictive model application. The type of the column is TIMESTAMP. Train Date It's the start date of the predictive model training. The type of the column is TIMESTAMP. - Statistics: select the statistics regarding the influencers you want to save in your dataset:
Statistic
Description
Assigned Bin
When selected, individuals in the application population are assigned to referring quantiles defined on the validation population.
Assigned bins explained: The validation population during training is spread out in quantiles (bins), each defined by a range of scores, to serve as references (assigned bins) to an application population. When a predictive model is applied, each individual in the application population is allocated to an assigned bin based on its predicted score. As each assigned bin represents 10% of the training population, if the population structure is unchanged, this % value should remain stable on the application population. If this is not the case, it doesn’t mean that the predictive model is no longer accurate, rather that the structure of the population has changed. For example there are more or less potential churners now, than in the past. The accuracy of the predictions should be monitored to back up the decisions.NoteThe number of bin is set to 10 and isn't customizable.See the section How does Smart Predict Create Assigned Bins? for information on using assigned bins.
Outlier Indicator
For each row in the application dataset, the Outlier Indicator is 1 if the row is an outlier with respect to the target, otherwise 0.
An observation is considered an outlier when the prediction error is greater than 3 times the average prediction error found on similar observations.
- Predictions: select the predictions to include in the output table:
Prediction
Description
Predicted Category
Classification predictive models (nominal target with 2 values only)
For each row in the application dataset, the Predicted Category is the target category determined by the predictive model.
The percentage of predicted target categories found in the application dataset corresponds to the Contacted Population percentage that is set by default when entering the Confusion Matrix.
Any change done by the user in the Confusion Matrix does not affect the Predicted Category in the generated dataset.
An alternate way could be to generate the Prediction Probability (instead of the Predicted Category) and set a decision threshold (see How is a Decision Made For a Classification Result?) on the value of the probability based on the business requirements.
Prediction Probability
Classification predictive models (nominal target with 2 values only)
For each row in the application dataset, the Prediction Probability is the probability that the Predicted Category is the target value.
Predicted Value
Regression predictive models (continuous target)
For each row in the application dataset, the Predicted Value is the value predicted for the target.
Prediction Explanations
Classification and regression predictive models
For each row of the application dataset, the Prediction Explanations is a set of explanations for the prediction.
NoteIf you do not select any statistics or predictions, only the target and the key influencer(s) are included. - Output as: Give a name to your generated dataset.
- Replicated Column: select which columns from the training data source that should replicated in the
generated dataset.
-
Click Apply.
The status of your predictive model is updated to <Applied>. You can find your generated dataset with the forecasts by viewing the Recent Files (from the side navigation, choose (Datasets) Recent Files) or by going to the Files page (from the side navigation, choose Files), where you can search for the file. You can then access to your results directly by opening the generated dataset or depending on your business needs, consume the output dataset in a BI story.