Choose Between Datasets and Models

Depending of your business case, you can choose between preparing your data using a model or a dataset.

Before building your story, you need to make sure that your data is prepared for scenario analysis. With SAP Analytics Cloud’s wrangling experience, data preparation can be done using either dataset or a model.

They both offer the same level of functionalities and don’t impact the way the data is consumed in stories. However, the way you create and manage these data objects are different, as they address different purposes.
Note
Datasets and remote models are not supported in story features using Smart Insight on a variance chart and cross-calculations on tables and charts.

The graphic below summarizes the main differences between a model and a dataset:

This image is interactive. Hover over each area for a description.

To create a predictive scenario in Smart Predict, you need a dataset. Dataset can be a dataset object in SAP Analytics Cloud or an embedded dataset if you import your data directly from a story. This allows you to toggle back and forth seamlessly between data preparation and story view in a flexible workflow. If a value in a cell does not comply to the datatype of the column, only that particular value is removed from the story, not the whole row. Security is applied on the dataset object level. You can either see or not see the data in the dataset as a whole, depending on your authorization level. The structure is inherited from the source and you don’t have control over it. Planners have already the structure in mind and would then either input the data or import it from different sources to fit into the model. It’s an SAP Analytics Cloud object. At a glance, you can see the data foundation, such as fact data, and the model’s dimensions surrounding the data foundation. When a model is created, a validation step is issued to identify the values in a column that do not comply to the datatype of the column, but also do not follow the model creation rules, such as not allowing multiple descriptions for a single ID member, or not having a circular dependencies when creating a hierarchy. If such issues are identified, the entire row is rejected from the model to ensure a certain level of data coherence in the model. You can decide on how data can be uploaded, in which region of the model. You can set up a data access control per dimension member.

Identify Which Format Fits Best Your Needs

Datasets store data in a single table, and the semantic structure is defined by the metadata. Go for a dataset if you’re only looking to upload data, using a .csv or .xlsx file, and analyze it in a story straight away.

Models store data as a star schema, and the structure of the model is reflected in the database. Go for a model if the structure of the data is already set, or if you already have a structure in mind before importing the data to fit into the model. A model is preferred when it comes to govern the data processing, like in case of planning.

Advantages of Using a Model

Models guarantee that the data they hold follows a series of business rules that certify that workflows such as planning can be run. Changes made to the structure of the model can be done either at the structure level, if the fact table is empty, or by rebuilding the model from the original data preparation session.

Models also support row-level security, fine-grained data management of dimensions, and fact tables.

Advantages of Using a Dataset

Datasets are reactive to change, any modifications you make to the data or data structure are done simply by editing the dataset, without data loss or restriction.

Example
Say you want to change the data type of a field from a Dimension to a Measure. If you're using a dataset, only the metadata definition of that column would need to be changed. Whereas if you are using a model, it' more time-consuming: You need to delete the dimension table and update the fact table to include an additional column.
Note
Datasets can be converted into models.