Monday, March 18, 2013
Testing and Data Privacy, is there an iIssue, (PART III out of IV)?
Let's recap. In the previous posts I discussed why we should be aware of how application changes are tested within your IT department etc. or we may have a data breach before you know it. Then I explained how to mitigate some of the risks with different processes/choices and listed the pros and con for each of them.
I will now continue the discussion about the various options, and which ones are the best etc.
So lets get started
The four choices that I presented previously are
1) Create your own test data
2) Copy production data into the test environment
3) Same as #2 but have everyone sign Non-disclosure agreements
4) Same as number #2 but obfuscate(scrub) the data
Looking at the obvious option #2. that is clearly a TABOO or is it?. The reason that we should not do this is obvious, right? Copying Data is what happens in the real world today. As far as I know there are not studies along these lines, (most companies would not want to share this type of information) but experience tells me that you would be surprised at the number of companies which have at least some areas where this practice is done regularly. While it can be argued that this would happen only within smaller companies, experience would say otherwise. Remember that you may have a policy in place forbidding this, but in some corner area of IT that has been around for years, they may be practicing "copy the data" because that is how it was always done. That being said, you may be surprised to hear me say that there can be times when there is a legitimate reason (fooled you) to copy production data within a testing environment.
This will be a topic for a future post concerning (and this is a BIG hint) testing, cost, risk and support issues that revolve around data and data privacy.
For now let's just say this is not a good option and should only be considered in specific areas and reasons.
Option #3 in my opinion is slightly different then just 'saying no'. It should be standard policy that all individuals, no matter who they are, employees, consultants or outsourceers need to sign a non disclosure agreement. But let me clear, this will not help in preventing any data breaches. And just to remind you why, there are studies concerning data breaches that state that more than 70% of all data breaches are non malicious. If the breach is malicious (disgruntled employee, criminal activity etc.) it will not stop data from be exposed either. So if it does not prevent breaches, why bother? What this does is make it easier for legal remedies in case there is a need.
Option #1 is a viable option. Many companies I worked with have policies along those lines And in fact chances are that your testers will have to make up some data to test things that should not happen in real life. IE testing for error checking/handling. But is it be all to end all? No. One can never make up all the permutations and combinations one would need to test to ensure that, first the change worked, and two that it did not break anything else. Now there are processes that mitigate the risks (for another post) involved. However there are no guarantees.
Last but not least there is Option #4. This option states that all product data copied over to testing should have the Personnel Identifiable Information (PII) scrubbed. There are problems even with this option. To do a good job in scrubbing the data (it took me two years to be able to even pronounce obfuscate, never mind to spell it, so scrub is the term that describes the option as well, and easier to roll off my tongue) takes time, money, expertise and some risk.
So what does the process entail. How does one go about scrubbing data? The first step is to identify all the fields that have PII. Easy, right?. Nope. In this complex world we live in, I can assure you in saying, No 'data' is an island entire of itself' (to Paraphrase John Donne)
Programs (applications, process etc) work together. The bill that is entered in the Accounts receivable system needs to be posted into the GL (as an example). etc. The bill also has a purchaser's Credit Card Number that feeds the Credit Card processor etc. The address on the bill is entered in the customer information system.
This interaction can be complex to say the least. One application has edits in place to verify a Zip/Postal code matches the address because the program that sends out mail needs to make sure the combinations make sense. But the application that is used for analyzing buying habits may not even look at this.
Once all the PII fields are discovered and how they are related between applications/files/databases, the next step is to figure out what method should be used to scrub the data given the interaction I just described. Do we scramble the values, or should we generate new ones. Does the data need to follow certain business rules? Are there home made systems that need to be used to mask the data (IE. account number generator).
There are basically four differnet types of scrubbing methods.
#1 A simple scrambling method. Taking wherever the letter 'A' appears and changing it to 'X' as an example. (there are variations of this to make it harder to reverse it the results).
#2 Looking up a translation table. by various methods using the original value as a key to find an entry within the translation table. So if that value appears in another location, the same scrubbed value is returned.
#3 Generating new data. Basically either randomly or with some guidelines. This is an issue because every time the same value will be scrubbed, the result will be different this losing consistency.
#4 Replace the data with a 'string' or blank etc. As an example putting 'N/A' in each free form field because no processing is done to that data.
And there are other techniques that I did not mention, such as, date aging, flip flopping of real data, mathematically manipulating the values etc.
After it is determined what techniques are to be used, the next step is 'coding' the rules to be applied. and then testing them. Expect that this is an iterative process because the more you do, the more will appear that you may have missed something.
And finally the implementation of the process.
This is not an easy task, nor is it something that should be taken on lightly. But if you don't want to have your company in the cross hairs of journalists, bureaucrats, courts, general public. You need to do due diligence (making sure you do the best you can to prevent data leakages).
In the next chapter I will talk about how this fits together in the overall picture, how one needs to consider other factors when talking about testing.