Generating and Saving the Predictions for a Classification or Regression Predictive Model

Context

You want to generate and save the predictions for a predictive model of type classification or regression.

Procedure

  1. Open the relevant predictive model.
  2. Click Apply Predictive Model .

    The Apply Predictive Model window opens.

  3. In the Apply To Population section, select the application you want to apply your predictive model on. Don't forget that this dataset must be prepared beforehand, it cannot be created at this step.
  4. In the Generated Dataset section, you select the additional columns you want to have in your generated dataset:
    • Replicated Column: select which columns from the training data source that should replicated in the generated dataset.
      Restriction

      If your application dataset contains more columns than your training dataset, the additional columns will be ignrored by the application process.

    • Statistics & Predictions: This is information about your predictive model that you want to have in the generated dataset.
      Information Description Comments
      Apply Date It's the start date of the predictive model application. The type of the column is TIMESTAMP.
      Train Date It's the start date of the predictive model training. The type of the column is TIMESTAMP.
    • Statistics: select the statistics regarding the influencers you want to save in your dataset:

      Statistic

      Description

      Assigned Bin

      When selected, individuals in the application population are assigned to referring quantiles defined on the validation population.

      Assigned bins explained: The validation population during training is spread out in quantiles (bins), each defined by a range of scores, to serve as references (assigned bins) to an application population. When a predictive model is applied, each individual in the application population is allocated to an assigned bin based on its predicted score. As each assigned bin represents 10% of the training population, if the population structure is unchanged, this % value should remain stable on the application population. If this is not the case, it doesn’t mean that the predictive model is no longer accurate, rather that the structure of the population has changed. For example there are more or less potential churners now, than in the past. The accuracy of the predictions should be monitored to back up the decisions.
      Note
      The number of bin is set to 10 and isn't customizable.

      See the section How does Smart Predict Create Assigned Bins? for information on using assigned bins.

      Outlier Indicator

      For each row in the application dataset, the Outlier Indicator is 1 if the row is an outlier with respect to the target, otherwise 0.

      An observation is considered an outlier when the prediction error is greater than 3 times the average prediction error found on similar observations.

    • Predictions: select the predictions to include in the output table:

      Prediction

      Description

      Predicted Category

      Classification predictive models (nominal target with 2 values only)

      For each row in the application dataset, the Predicted Category is the target category determined by the predictive model.

      The percentage of predicted target categories found in the application dataset corresponds to the Contacted Population percentage that is set by default when entering the Confusion Matrix.

      Any change done by the user in the Confusion Matrix does not affect the Predicted Category in the generated dataset.

      An alternate way could be to generate the Prediction Probability (instead of the Predicted Category) and set a decision threshold (see How is a Decision Made For a Classification Result?) on the value of the probability based on the business requirements.

      Prediction Probability

      Classification predictive models (nominal target with 2 values only)

      For each row in the application dataset, the Prediction Probability is the probability that the Predicted Category is the target value.

      Predicted Value

      Regression predictive models (continuous target)

      For each row in the application dataset, the Predicted Value is the value predicted for the target.

      Prediction Explanations

      Classification and regression predictive models

      For each row of the application dataset, the Prediction Explanations is a set of explanations for the prediction.

      Note
      If you do not select any statistics or predictions, only the target and the key influencer(s) are included.
    • Output as: Give a name to your generated dataset.
  5. Click Apply.
    The status of your predictive model is updated to <Applied>. You can find your generated dataset with the forecasts by viewing the Recent Files (from the side navigation, choose Start of the navigation path (Datasets) Next navigation step Recent FilesEnd of the navigation path) or by going to the Files page (from the side navigation, choose Files), where you can search for the file. You can then access to your results directly by opening the generated dataset or depending on your business needs, consume the output dataset in a BI story.