Understanding Predictive Goal and Training Roles for Variables
A variable corresponds to a column in a dataset or a dimension in a planning model. The observations relating to each variable correspond to the rows. Variables that have been specified as a target, or an entity identifier, are not considered as influencers. Unless you exclude certain influencers, all other variables are treated as influencers. The training retains the most significant ones for the predictive model reports for debriefing.
Role | Description | Example |
---|---|---|
Target | The variable that you want to explain, or predict the values for. |
Example
|
Date | The variable used for the date values. Note This variable is
mandatory for a time series predictive scenario. |
The date formats that should be used in your dataset are the
following:
Here, YYYY stands for the year, MM for the month,DD for the day of the month, hh stands for the hour, mm stands for the minutes, and ss stands for the seconds. Note Let's say you
use the YYYY-MM-DD date format, you can create Time Series
Predictive Scenarios where the date granularity can be:
|
Entity | Optionally used in a time series predictive scenario. It’s the
identifier variable that you want to use to split up the predictive
model into entites, with each one producing its own predictive
model, so you get distinct predictions for each entity. The predictive model can then catch behaviors that are specific to a given entity, and so produce more accurate predictions. The entity can be a dimension in the data, for example Region, Store, or Product Family. |
Example You want to forecast the energy consumption by industry sector for the next 6
months. Your target is <Energy consumption> and
your entity is <Industry sector>. You will get
predictions and performance indicators for each industry sector:
commercial, industrial, residential,
transportation. |
Influencer |
The influencers are variables that describe your data and which serve to explain a target. Unless excluded, all variables that aren't already selected as a target, or an entity identifier, are considered as influencers, with only the most significant ones being retained after training for debriefing. During the predictive model creation, you can decide to exclude influencers from the training process, these are not taken into account to compute the predictive model, not included in the statistics for the predictive model, not retrieved from the data source, and not needed when you apply the predictive model to an application data source. Remember
You should exclude influencers that are directly related to the target, especially
variables that contain indirectly a target variable.
Statisticians call these variables as "leakers" or "leak
variables". This will produce a wrong predictive model
with wrong performance indicator unable to produce
prediction.
Example If a predictive model has the
target variable <has bought the product
Yes/No>, you should exclude the
influencer <Billing amount> if it
contains the cost for the product.Tip If there is a variable that is influencing the
prediction at very high level then there is a chance that it
is a leak variable.Excluding influencers that have no influence on the targets (for example <account number>) can help speed up the training process. |
Example
Your company is marketing two products A and B. You have a database, which contains references to:
|