Before we get started, let’s review some critical items that we covered last time.
The analysis phase of any Test Data Privacy Project (as all other IT projects) is the lynch pin, where you make or break the project. So to summarize this step you need the following:
1) Identify the meta data of the application(s) in question
2) ‘Marry” the meta data to the data stores (the field in the meta data that corresponds to underlying table/file)
3) Inspect the potential PII Fields/data to see if they are actual fields that need masking. A sample would be nice to show the SME, if there are any questions about the field contents.
Then we have the design step. The following is the continuation of the discussion from my pervious blog entry.
It is the SME who is the critical member of the project team in this phase. He/she will be asked questions like, how do these fields that were identified in the previous step as PII, interact with each other? A simple example: is there an edit to make sure the city zip/postal code combination is valid?. The rules should be consistent throughout the environment. IE, if you age the birthday in one file in a certain way, you will need to age the birthday the same way in any other data store you have.
Now before we move on, I should address a question that should be brought up at this time. Are you going to need to sub-set the data while masking it? (see my blog Testing and Data Privacy, is there an issue (final post or is it)? After you answer that question, the next one is HOW? (And as I mentioned before, my expectation is that you will answer YES to this question). Are you going to want to take a random set of customers (as an example) and mask all the related records of those customers? Or has the SME given you a list of branches that will be used for testing? So you need to also mask the customers of those branches, including the addresses of the customers of those chosen branches, the SSN/SIN/Tax ID for those customers, and extract only those products that the target branches have to sell etc. What this all means is that you will need to design the extract process at the same time as the masking process. This can be a large hurtle to be overcome, BUT the end results will more then make up for the effort. (this will be a subject of another blog entry in the future)
WARNING WARNING WARNING
I’ve got your attention, I hope. What I need to highlight here is that the sub-setting of data and the obfuscation of the data needs to be done at the same time. Failure to do this, may mean an increase chance of a data breach. Now back to your regular scheduled program.
The actual masking rules do not only depend on the requirements, as defined by the SME and/or legal/privacy personnel (see above), but also is driven by the chosen tool set that you have. For example, if the toolset you are using, does not use >128 bit Strong encryption, should you still use that technique for masking? If you need to be able to reverse the obfuscation (if there is a legitimate reason) then that may restrict what kind of rules/code that can be used to mask the data in the first place.
Another aspect that needs to be considered, but many times forgotten, is how will the audit requirements be satisfied for this project? And make no mistake about it, there will be a need for audit reporting for this process. Why do I say that? It is because the masking process is most likely being driven by either regulatory requirements, or best practices. And in either case some sort of ‘proof of the pudding’ will be required. This also needs to be taken into account within the project.
Once the design phase is finished, we will then move on to the coding. There is not much I can say here :
1) Depending on the chosen toolset you will be using, it will indicate how one will code the rules, and the limitation of those same rules
2) Try to reuse as much of the masking rules as you can. There is no need to reinvent the wheel, if one can help oneself. Some tolls allow for one rule to be applied to many different data sources. And for obvious reasons that is something I encourage you to do as much as possible
Next is the implementation phase. This should be the easiest step. I mean, isn’t this just another IT project? And don’t you implement IT projects ‘all the time’? It should follow the same process, right?
Maybe. But to see if it is easy, one needs to ask a series of questions first. Some examples of questions are as follows;
1) How often will the obfuscation needs to be run?
2) Who is responsible for the running of the process? Will it be production support, or will the users themselves run the series of jobs in question?
3) Will there be a need to have user input before each run. (IE. Will the data sub-setting requirements change)
4) How will change management be taken care of? In other words, if a file/field is changed or added, how will the masking process be updated? Who will do it? And how do you ensure nothing falls between the cracks.
5) Make sure that the Audit reporting is implemented. Is it on request, or will some sort of reporting need to be done every time? Will the reports need to be secured?
And in all these steps, you should make sure you document EVERYTHING, in a concise and accurate manner. Only with this being done can one try to assure a successful ongoing, maintainable process. I would suggest setting up a Lotus/Excel worksheet to help with this.
The intention of this blog is not to replace due diligence. Each IT environment is different, with its unique challenges. My sole intention is to try to help the community to tackle this concern head on. Experience tells me that this is a big task, but does not have to be daunting.
As the many clients I have known can attest to, if one does this methodically, with foresight, one can achieve a successful conclusion.
If you have any questions about this or any other topic that I post, or you want me to explore some issue, drop me a line at firstname.lastname@example.org.
Till next time