Data Subsetting

Anyone who wants to test software quickly and effectively needs a manageable data test. The complete database is often not suitable due to its size and complexity. Manually removing a piece does not work, because it takes away the representativeness of the data. In such a case, data subsetting is the solution.

What is data subsetting?

When creating a data subset, a representative selection is made of the production data using an automated tool.

This creates a test set that is manageable, in contrast to the large and complex total data set.

When creating a data subset, the connections between data (the referential integrity) are maintained: if a field refers to a field in another table, this reference is maintained. This creates a representative test set that will not generate unnecessary errors due to incorrect references. Data types also remain the same: a numeric field remains numeric.

Benefits of data subsetting

  • Testing is faster because less time is needed for storage and running the test.
  • A complete database consumes bandwidth, a subset does not. Less storage is required, less hardware and fewer licenses.
  • The subset is representative of the entire database because references but also data types (such as numerical fields) remain unchanged.
  • You can refresh and apply the test data more easily and quickly for different testers and testing purposes.
  • You comply with the proportionality principle of European privacy legislation: the size of the test set is tailored to the nature and purpose of the test.

Product specifications data subsetting

General characteristics

  • Easy to implement
  • Quick to roll out
  • Low opertaing costs
  • Speeds up the development cycle
  • Aligns with agile working
  • Savings for data storage

Functional properties

  • Preserving referential integrity of data set
  • Data quality remains unchanged
  • Can be expanded with synthetic data
  • Subset freely definable

Technical properties

  • Easily scalable
  • High performance
  • Cross-platform
  • Minimal management effort
  • Easy integration with CI/CD pipeline

Our solution

Our automated subsetting solution is easy to use and relatively quick to implement. In combination with the Data Masking module, the Subsetting module offers a solution for anyone who quickly needs a representative dataset. The module can be used on various databases such as Oracle, Microsoft SQL server and IBM DB2.

Would you like to receive more information? Please contact us!

Latest news

April 12, 2024

A guide to effective document management

Paperwork ‘adieu’ Welcome to our comprehensive guide to effective…

Read more
April 10, 2024

Document management as key to research productivity

In research, having access to relevant documents is priceless. Quickly and…

Read more
April 9, 2024

Automatic cleansing of KYC data a step toward accuracy and compliance

The Know Your Customer (KYC) process is an essential part of the financial…

Read more