In the previous posts I covered the issue of testing data and privacy. What options are generally available to 'address' the issue, and a description of what each of the options are.
This time I will wrap up this portion of the discussion and then further delve into related issues that may be of interest.
If you have read the previous post, you may have surmised that the option I favour is analysis of the data structure/elements. Then applying intelligent business savvy masking rules to the copied data This entails designing a process that would obfuscate the Personnel Identifiable Information(PII) data by applying rules that take into account the business logic to the information that is retained within the organization for testing purposes.
But at the same time there is never an all or nothing answer to these issues. It all depends on the situation, the company culture and requirements to name but a few mitigating circumstances. But let me explain.
And by the way, I will try to stop myself from going down the techy talk that most NON IT people get lost in.
So let's assume the company we work for has surmised that the testing environment(s) that exists presently needs to be scrubbed to ensure that there is no real PII information. Yet the CIO also insists that one of the requirements to ensure quality work is the ability to copy real data for testing from time to time.
So the requirement is to copy data, when needed, but removing PII at the same time. That the removal of the sensitive information will still retain the quality that is needed.We need to develop a process that scrubs the data consistently and have it executed whenever a data copy is to be done. Right?.... But how much data do we scrub? Do we need more then one copy? Who is going to be responsible to maintain the obfuscating rules etc.?
These are just some of the other factors that need to be considered.
As you start your analysis you may come up with a question along these lines. Will there be a need to sub-set the data while copying the real data for testing?
The more revealing question may be, what are we going to be doing with this data after it is scrubbed? You might think that testing is the response and you would be right. But what kind of testing? You see in most medium to large companies there are more then one kind of testing that is done before any changes are put into the real world.
There is the testing that the coder/programmer does to help make changes to the code to ensure that the program works and produces the anticipated results. Generally speaking, this is called unit testing. In this case there may not even be a need for real data just some made up stuff. So we might not need to consider this type of testing in our requirement analysis.
Then there is what I call kernel testing. To run a logical unit/series of 'programs' (yes they can be stored procedures, scripts etc, but I am trying to keep the terminology simple and it really means the same thing) to see if it runs with the changes successfully. Usually this is where a small sample of real data would be used. The data used here does not have to be related to any other application/data, so the masking process would be rather easy to implement. There would be no need to ensure that the same rules that are applied here would be applied to another application within the organization.
Next is some form of regression testing. Simply put, this is to make sure the application still works with the changes done to code. However, you will probably not want the same number of records as production data. Otherwise each test would take the same amount of resources/time as
production. Remember, you are testing to make sure everything works, and
if it doesn't you need to correct the issue and retest. The old adage
goes like this, time is money. The quicker the programers/coders can turn around the
testing the better. That means you will need to sub-set the data in question. AN example would be take a single branch's data as a test versus the entire companies branches. However this is not as easy as it sounds.
For example, if we have a banking application that we are going to be testing, we may decide to use only branch 'A' as the testing branch. This branch has a wide variation of customers etc. and it fits very nicely in the testing that needs to be done. We will need to copy only those customers within that branch (this most likely will be in some other location database). We will then need to copy only those accounts of those customers within that particular branch. In other words copy all the related information and only the related information for that branch, sub-setting the data. Oh, don't forget that we will need to mask the data as it is being moved over from production to avoid any potential issues further down the line.
Next maybe a user acceptance test allowing the users of the application in question to test the change(s) to ensure it is what they asked for and it works are required. While a complete copy of data can be used, a sub-set of data can also be used in most cases.
And then in the next order of business there may be a volume test. This test is normally done to ensure that the application can take the real world volume. (all the branches), the final kick of the tires, you can say.
Now while I have generalized, and each company/requirements are different, I hope you can see complexity that is involved. The type of testing and the data used for that testing is extremely important, and it is just as important to analyze each testing requirement and come up with a solution that meets all the needs.
So lets assume that we have all the answers to the questions posed above. We know what kind of data we need, the various versions/copies and the other parameters that may have been discovered. What is next?
The next post will cover the how to. The components of privacy project, the pitfalls, the bumps on the road, and the elephant in the room (and yes there is a BIG elehpant that needs to be fed)
While is may not be directly related to a privacy role, anyone in privacy needs to understand the complexity inherent in the process that a company needs to go through, so the project will come to a successful completion.
So I strongly suggest you stay tuned for the next installement. Till then if you have any questions feel free in contacting me at the email address below