Wednesday, March 6, 2013

Testing and Data Privacy, Is there an issue(Part II)??

So here we are now. Lets  recap some of the major points about the subject that we covered previously before we go on.

IT departments maintain and use both 'Production' (what is used to run the business) and Testing environments. They need data to test with. And where do you think most of the testing data comes from? In the 'real world', it is most likely 'real' Credit cards numbers ( PCI DSS does not allow this (Payment Card Industry Data Security Standard), Tax Identification numbers etc.

And to further complicate the matters, testing by it's very nature, means easier access to the data by Developers, Testers, IT operations etc. And this gives us the exposure that business try  so hard to avoid. And you may not even know about it.

So lets take a look at some legal ramifications of this matter. 

An example is in  Canada, where one of the principle laws governing Privacy is Personal Information Protection and Electronic Documents Act (PIPEDA). Basically (and this is an over simplification but is good enough for this discussion) the Company will use the Personal Identifiable Information (PII) it gathers solely for the intent that 'advises'  the user. So if a user goes into a bank to open an account as an example , he/she has to sign a 'whole bunch' of papers, and more often then not get a copy of them to take home to wall paper the house (I know its a bad joke.) Realistically these statements are only read by a lawyer or a privacy specialist). 

But in all seriousness, at least one of these documents(Best practices) is basically an agreement made with the bank that allows the bank  to gather the information they need, to provide the service you are requesting from them. It also states who they may share that information with, and how they will protect it, and hopefully list a  Department/Person in case one has any questions about the Privacy Policy of the Company. 

I guarantee that there is no place in that document that states the company may use the information for testing purposes. And don't forget the looser criteria requirements of the testing world.

If you think that this is only for Canada you will be mistaken, big time. As an another example in the EU one of the applicable 'laws' is called Directive 95/46/EC (Or more commonly known as the The EU Directive on Data Protection). It is one of the most stringent laws pertaining to Privacy there is. And don't be fooled by thinking that just because you do not have any offices in the EU or Canada etc, you don't have to worry about that. In fact if you have any customers from the EU, or collect some information while they are on your website, you may still be under their Privacy jurisdiction.

Now this particular aspect is worth a book in itself, but let's just leave it for now, and if you, the reader agrees, we can try to figure out what needs to be done, and the benefits/cost of each solution.

1) Well, lets create the test material needed and not rely on ANY real data.

The Pros:

Will not need to worry about relaxed security restrictions because the information does not represent any real person.

The data is  'easy' to create. So  even if the printed reports are found in the trash bin there will be no worries.

The Cons:

 'Quality' of the made up data. Is the data,  a good sampling of the various permutations and combinations of different aspects of your customers. I.E. do you have  customers who live in NYC (Hong Kong, Budapest, Montreal etc) and who have a chequeing account in the spouses name as well as two children's accounts, etc. If you do not cover all the different variations that exist, how do you know that your testing is complete and will be able to  discover failures before implementation?

2) Copy Real Data for use in testing

The Pros:

You will be testing with real data, and if there are a issues, they will be discovered before the change is put into 'production. If the tests work then there is no reason why it will not work during productions

The Cons:

As previously discussed, chances are that you are close to breaking some laws  (if any of the information in question is PII).

The data volumes, is another concern. Who nowadays has the capacity, large or small business, to be able to copy the entire production data to be used for testing. And if we are talking about most major companies they may have many testing environments to help them to move forward.

Then there is extra time you will need for multiple testing to be done with large amounts of data. (another topic in my series of Blogs in the future will be about volumes of data and testing types, etc. and issues/solutions).

The reduced Security (see above) around the testing will allow increased access. This could increase the chances of a Data Breach.

If there is a  Data Breach, your company's reputation would suffer and its name may appear on the front page of the local/national newspaper etc. The  cost of loss of customer confidence with your organization may also effect the bottom line. This can cost millions of dollars and loss of business. (All depending on the number of records exposed).

3) Copy Real Data For use in testing and have everyone sign non disclosure agreements.

The Pro:

You now use real data, with all its different combinations, to test with and the legal protection of a non disclosure agreement.

The Cons:

According to some studies, over 70% of all Data Breeches are non malicious and therefore agreements of this sort would not stop a breach.

We are also still looking at large volume issues.

Real data may not have all the information you need for testing properly (testing for error handling as an example)

4) Copy and  obfuscate(scrub)  the PII data so no one can figure out who the real data record represents

The Pros:

You get real data to work with and thus even if a reports ends up in a trash bin, no one can figure out who the data identifies, belongs to.

The Con:

You will need to have a full understanding your data

You will have to do  analysis work on how to scrub the data.

You will need to understand how the PII data  work together within your environment/application.

In my next blog I will further investigate all of the above options and discuss which option maybe the most suitable for your situation. Maybe a hybrid solution could be the answer.

If you have any comments or questions, feel free in dropping me a line

As a note, this blog is not attended to be legal advice.


View Robert Galambos CIPP/C CIPP/IT VA3BXG's profile on LinkedIn

No comments:

Post a Comment