Data Subsetting

Anyone who wants to test software quickly and effectively needs a manageable data test. The complete database is often not suitable due to its size and complexity. Manually removing a piece does not work, because it takes away the representativeness of the data. In such a case, data subsetting is the solution.

What is data subsetting?

When creating a data subset, a representative selection is made of the production data using an automated tool.

This creates a test set that is manageable, in contrast to the large and complex total data set.

When creating a data subset, the connections between data (the referential integrity) are maintained: if a field refers to a field in another table, this reference is maintained. This creates a representative test set that will not generate unnecessary errors due to incorrect references. Data types also remain the same: a numeric field remains numeric.

Benefits of data subsetting

  • Testing is faster because less time is needed for storage and running the test.
  • A complete database consumes bandwidth, a subset does not. Less storage is required, less hardware and fewer licenses.
  • The subset is representative of the entire database because references but also data types (such as numerical fields) remain unchanged.
  • You can refresh and apply the test data more easily and quickly for different testers and testing purposes.
  • You comply with the proportionality principle of European privacy legislation: the size of the test set is tailored to the nature and purpose of the test.

Product specifications data subsetting

General characteristics

  • Easy to implement
  • Quick to roll out
  • Low opertaing costs
  • Speeds up the development cycle
  • Aligns with agile working
  • Savings for data storage

Functional properties

  • Preserving referential integrity of data set
  • Data quality remains unchanged
  • Can be expanded with synthetic data
  • Subset freely definable

Technical properties

  • Easily scalable
  • High performance
  • Cross-platform
  • Minimal management effort
  • Easy integration with CI/CD pipeline

Our solution

Our automated subsetting solution is easy to use and relatively quick to implement. In combination with the Data Masking module, the Subsetting module offers a solution for anyone who quickly needs a representative dataset. The module can be used on various databases such as Oracle, Microsoft SQL server and IBM DB2.

