Before we get started, let’s review some critical items that
we covered last time.
The analysis phase of any Test Data Privacy Project (as all
other IT projects) is the lynch pin, where you make or break the project. So to
summarize this step you need the following:
1) Identify
the meta data of the application(s) in question
2) ‘Marry”
the meta data to the data stores (the field in the meta data that corresponds
to underlying table/file)
3) Inspect
the potential PII Fields/data to see if they are actual fields that need
masking. A sample would be nice to show the SME, if there are any questions
about the field contents.
Then we have the design step. The following is the continuation
of the discussion from my pervious blog entry.
It is the SME who is the critical member of the project team
in this phase. He/she will be asked questions like, how do these fields that
were identified in the previous step as PII, interact with each other? A simple example: is there an edit to make
sure the city zip/postal code combination is valid?. The rules should be
consistent throughout the environment. IE, if you age the birthday in one file in
a certain way, you will need to age the birthday the same way in any other data
store you have.
Now before we move on, I should address a question that
should be brought up at this time. Are you going to need to sub-set the data
while masking it? (see my blog Testing and Data Privacy, is there an
issue (final post or is it)? After you
answer that question, the next one is HOW? (And as I mentioned before, my
expectation is that you will answer YES to this question). Are you going to
want to take a random set of customers (as an example) and mask all the related
records of those customers? Or has the SME given you a list of branches that
will be used for testing? So you need to also mask the customers of those
branches, including the addresses of the customers of those chosen branches, the SSN/SIN/Tax ID for those customers, and
extract only those products that the target branches have to sell etc. What this all means is that you will need to
design the extract process at the same time as the masking process. This can be
a large hurtle to be overcome, BUT the end results will more then make up for
the effort. (this will be a subject of another blog entry in the future)
WARNING WARNING WARNING
I’ve got your attention, I hope. What I need to highlight here is that
the sub-setting of data and the obfuscation of the data needs to be done at the
same time. Failure to do this, may mean an increase chance of a data
breach. Now back to your regular
scheduled program.
The actual masking rules do not only depend on the requirements, as
defined by the SME and/or legal/privacy personnel (see above), but also is
driven by the chosen tool set that you have. For example, if the toolset you
are using, does not use >128 bit Strong encryption, should you still use that
technique for masking? If you need to be able to reverse the obfuscation (if
there is a legitimate reason) then that may restrict what kind of rules/code
that can be used to mask the data in the first place.
Another aspect that needs to be considered, but many times forgotten, is
how will the audit requirements be satisfied for this project? And make no
mistake about it, there will be a need for audit reporting for this process.
Why do I say that? It is because the masking process is most likely being
driven by either regulatory requirements, or best practices. And in either case
some sort of ‘proof of the pudding’ will be required. This also needs to be taken
into account within the project.
Once the design phase is finished, we will then move on to the
coding. There is not much I can say here
:
1) Depending on the
chosen toolset you will be using, it will indicate how one will code the rules,
and the limitation of those same rules
2) Try
to reuse as much of the masking rules as you can. There is no need to reinvent
the wheel, if one can help oneself. Some tolls allow for one rule to be applied
to many different data sources. And for obvious reasons that is something I
encourage you to do as much as possible
Next is the implementation phase. This should be the easiest
step. I mean, isn’t this just another IT project? And don’t you implement IT
projects ‘all the time’? It should
follow the same process, right?
Maybe. But to see if it is easy, one needs to ask a series
of questions first. Some examples of questions are as follows;
1)
How often will the obfuscation needs to be run?
2)
Who is responsible for the running of the
process? Will it be production support, or will the users themselves run the series
of jobs in question?
3)
Will there be a need to have user input before
each run. (IE. Will the data sub-setting requirements change)
4)
How will change management be taken care of? In
other words, if a file/field is changed or added, how will the masking process
be updated? Who will do it? And how do you ensure nothing falls between the
cracks.
5)
Make sure that the Audit reporting is
implemented. Is it on request, or will some sort of reporting need to be done
every time? Will the reports need to be secured?
And in all these steps, you should make sure you document
EVERYTHING, in a concise and accurate manner. Only with this being done can one
try to assure a successful ongoing, maintainable process. I would suggest
setting up a Lotus/Excel worksheet to help with this.
The intention of this blog is not to replace due diligence.
Each IT environment is different, with its unique challenges. My sole intention
is to try to help the community to tackle this concern head on. Experience
tells me that this is a big task, but does not have to be daunting.
As the many clients I have known can attest to, if one does
this methodically, with foresight, one can achieve a successful conclusion.
If you have any questions about this or any other topic that
I post, or you want me to explore some issue, drop me a line at rgalambos@gmail.com.
Till next time
No comments:
Post a Comment