3 ways to make your test data GDPR compliant

Comments Off

By now we all know that you can’t use production data (or a copy of it) for testing purposes. The GDPR states that personally identifiable information may not be used for secondary purposes such as marketing, training and testing. There is a chance that personal data will end up in the wrong hands or leaked out. But when you’re a software tester or quality engineer, you need (production-like) data for your tests, right? Because how else can you test your applications and make sure they will work properly in production? Fortunately there are three (easy) ways to make your test data GDPR compliant, so you can keep testing with high-quality data:

Data masking
Synthetic data generation
Combination of techniques

Data masking

Data masking is the process of hiding privacy sensitive data that is stored in your database. The main goal is that personal information like names, addresses, IBAN, Social Security numbers, salaries et cetera is not traceable to a natural person anymore. But what makes information personal or privacy sensitive? A name in itself is not privacy sensitive, but the fact that the person with this name has a giant debt does. With the help of data masking tools you make sure that different pieces of personal information are not linked anymore. You can shuffle names, scramble text or numbers, set birthdays to the first day in the same month or year (so the birth year remains functional), use custom expressions, blank fields you don’t need for you tests and more. All these masking rules (which are combined in a masking template) help you make sure that your personal data is not privacy sensitive anymore.

Synthetic data generation

As an alternative to data masking you can choose to synthetically generate data for your test database. Synthetic data can be used as a masking technique or you can generate data from scratch if you don’t have any data in production yet (when you’re testing a brand new application for instance). Synthetically generated data is also helpful when you have outliers or specific cases in your dataset. The highest salary, for example, can be easily assigned to a particular person (you probably have an idea which employee of your organization earns the most). By using synthetic data for these fields, you can rule out these cases of accidental recognition.

Combination of techniques

We probably don’t need to explain why using a combination of data masking and synthetic data generation is the best method. You get the best of both worlds. On the one hand your data (and the data structure) remains intact as much as possible, so that it remains close to production. On the other hand, you use all the advantages that synthetic data brings.

Conclusion

Both data masking tools and synthetic data generation will help you with getting your test data GDPR compliant. The best way is using a combination of techniques to get both a compliant and a functional (high quality) test data set. With a good data masking tool you not only mask (or generate) your data, but you also generate an audit report with which you can demonstrate your masking efforts. A great document to hand over to the privacy authorities if they come check.

3 ways to make your test data GDPR compliant

Data masking

Synthetic data generation

Combination of techniques

Conclusion

Share this post: