Partition Strategy
A partition strategy is a technique that decomposes a training data source into two distinct
subsets:
- A training subset
- A validation subset
The partition is performed as follows:
- The row or dimension selection is random
- The training subset contains 75% of the input rows or dimensions.
- The validation contains 25% of the input rows or dimensions.
Thanks to this partition strategy, the application can cross-validate the predictive models generated to ensure the best performance.
The following table defines the roles of the two data subsets obtained using partition
strategies.
The data source | Is used to... |
---|---|
Training | Generate different predictive models. The predictive models generated at this stage are hypothetical. |
Validation | Select the best predictive model among those generated using the training subset, which represents the best compromise between perfect quality and perfect robustness. |
Note
For Time Series Forecast, the validation subset allows you to calculate the
confidence interval (Error Min and Error Max) of the predictions.