Data Subsetting
Anyone who wants to test software quickly and effectively needs a manageable data test. The complete database is often not suitable due to its size and complexity. Manually removing a piece does not work, because it takes away the representativeness of the data. In such a case, data subsetting is the solution.
What is data subsetting?
When creating a data subset, a representative selection is made of the production data using an automated tool.
This creates a test set that is manageable, in contrast to the large and complex total data set.
When creating a data subset, the connections between data (the referential integrity) are maintained: if a field refers to a field in another table, this reference is maintained. This creates a representative test set that will not generate unnecessary errors due to incorrect references. Data types also remain the same: a numeric field remains numeric.
![](https://entrd.nl/wp-content/uploads/2023/09/entrd-document-scan.png)
![](https://entrd.nl/wp-content/uploads/2023/09/entrd-document-lees.png)
Benefits of data subsetting
- Testing is faster because less time is needed for storage and running the test.
- A complete database consumes bandwidth, a subset does not. Less storage is required, less hardware and fewer licenses.
- The subset is representative of the entire database because references but also data types (such as numerical fields) remain unchanged.
- You can refresh and apply the test data more easily and quickly for different testers and testing purposes.
- You comply with the proportionality principle of European privacy legislation: the size of the test set is tailored to the nature and purpose of the test.
Product specifications data subsetting
General characteristics
- Easy to implement
- Quick to roll out
- Low opertaing costs
- Speeds up the development cycle
- Aligns with agile working
- Savings for data storage
Functional properties
- Preserving referential integrity of data set
- Data quality remains unchanged
- Can be expanded with synthetic data
- Subset freely definable
Technical properties
- Easily scalable
- High performance
- Cross-platform
- Minimal management effort
- Easy integration with CI/CD pipeline
Our solution
Our automated subsetting solution is easy to use and relatively quick to implement. In combination with the Data Masking module, the Subsetting module offers a solution for anyone who quickly needs a representative dataset. The module can be used on various databases such as Oracle, Microsoft SQL server and IBM DB2.
Would you like to receive more information? Please contact us!
![](https://entrd.nl/wp-content/uploads/2023/12/icon-logo.png)
Latest news
Here’s how to prevent data breaches within your company
Not a day goes by without another report of a data breach somewhere on the…
Read moreHow secure are personal data in a personal health environment (PHE)?
A Personal Health Environment (PHE) is a digital environment in which…
Read moreReviewing documents for content rather than metadata
When reviewing documents, it is common to rely on metadata, such as titles,…
Read more