IT departments maintain and use both 'Production' (what is used to run the business) and Testing environments. They need data to test with. And where do you think most of the testing data comes from? In the 'real world', it is most likely 'real' Credit cards numbers ( PCI DSS does not allow this (Payment Card Industry Data Security Standard), Tax Identification numbers etc.
And to further complicate the matters, testing by it's very nature, means easier access to the data by Developers, Testers, IT operations etc. And this gives us the exposure that business try so hard to avoid. And you may not even know about it.
So lets take a look at some legal ramifications of this matter.
An example is in Canada, where one of the principle laws governing Privacy is Personal Information Protection and Electronic Documents Act (PIPEDA). Basically (and this is an over simplification but is good enough for this discussion) the Company will use the Personal Identifiable Information (PII) it gathers solely for the intent that 'advises' the user. So if a user goes into a bank to open an account as an example , he/she has to sign a 'whole bunch' of papers, and more often then not get a copy of them to take home to wall paper the house (I know its a bad joke.) Realistically these statements are only read by a lawyer or a privacy specialist).
I guarantee that there is no place in that document that states the company may use the information for testing purposes. And don't forget the looser criteria requirements of the testing world.
Now this particular aspect is worth a book in itself, but let's just leave it for now, and if you, the reader agrees, we can try to figure out what needs to be done, and the benefits/cost of each solution.
1) Well, lets create the test material needed and not rely on ANY real data.
Will not need to worry about relaxed security restrictions because the information does not represent any real person.
The data is 'easy' to create. So even if the printed reports are found in the trash bin there will be no worries.
'Quality' of the made up data. Is the data, a good sampling of the various permutations and combinations of different aspects of your customers. I.E. do you have customers who live in NYC (Hong Kong, Budapest, Montreal etc) and who have a chequeing account in the spouses name as well as two children's accounts, etc. If you do not cover all the different variations that exist, how do you know that your testing is complete and will be able to discover failures before implementation?
2) Copy Real Data for use in testing
You will be testing with real data, and if there are a issues, they will be discovered before the change is put into 'production. If the tests work then there is no reason why it will not work during productions
As previously discussed, chances are that you are close to breaking some laws (if any of the information in question is PII).
The data volumes, is another concern. Who nowadays has the capacity, large or small business, to be able to copy the entire production data to be used for testing. And if we are talking about most major companies they may have many testing environments to help them to move forward.
Then there is extra time you will need for multiple testing to be done with large amounts of data. (another topic in my series of Blogs in the future will be about volumes of data and testing types, etc. and issues/solutions).
The reduced Security (see above) around the testing will allow increased access. This could increase the chances of a Data Breach.
If there is a Data Breach, your company's reputation would suffer and its name may appear on the front page of the local/national newspaper etc. The cost of loss of customer confidence with your organization may also effect the bottom line. This can cost millions of dollars and loss of business. (All depending on the number of records exposed).
3) Copy Real Data For use in testing and have everyone sign non disclosure agreements.
You now use real data, with all its different combinations, to test with and the legal protection of a non disclosure agreement.
According to some studies, over 70% of all Data Breeches are non malicious and therefore agreements of this sort would not stop a breach.
We are also still looking at large volume issues.
Real data may not have all the information you need for testing properly (testing for error handling as an example)
4) Copy and obfuscate(scrub) the PII data so no one can figure out who the real data record represents
You get real data to work with and thus even if a reports ends up in a trash bin, no one can figure out who the data identifies, belongs to.
You will need to have a full understanding your data
You will have to do analysis work on how to scrub the data.
You will need to understand how the PII data work together within your environment/application.
In my next blog I will further investigate all of the above options and discuss which option maybe the most suitable for your situation. Maybe a hybrid solution could be the answer.
If you have any comments or questions, feel free in dropping me a line
As a note, this blog is not attended to be legal advice.