Monday, May 27, 2013

Musing of Big Data and Privacy

Big Data and Privacy. Or should a Big Box store figure out if someone is pregnant?  

Is that Private?


So what is Big Data? Is it the latest 'fashion statement' from the IT world? A bunch of numbers, letters, that represent something or someone? Something of an asset?

All the above and more. Basically it is the information, or data, that is generated by everyone and everything.  Examples of Big Data include this particular blog entered on the web, the decoding of the human genome, the buying habits for your customers, your credit score etc.

 Its 'stuff'. 
Google’s CEO Eric Schmidt stated: “From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating.”

SO that is Big Data. But how does it concern privacy? Before we go there, lets reflect this issue. 

Companies are generating great mounds of data. Everything from what you purchase in grocery items (those Customer loyalty cards) to what credit cards you use and where. 

This is an asset to the company. It is something that can be analyzed, inspected, and reported on, all for the purpose to get the upper edge from their competitors,  a better understanding of the customers,how to market/target them to get the best results, What tickles their fancy so to speak? Maybe get that same customer to buy milk from your company as well as  the clothing that they buy now.

While doing the research for this blog I came across an interesting case study concerning this  issue.
A major Big Box chain’s (not Wal-Mart) department of thinkers (not a real department but could have well been named that) got together to try to see if they could 'predict' which of their  customers were pregnant. 

The reason was if they can get that pregnant customer to start buying the 'stuff' needed for the happy occasion, they could influence their buying patterns in the future. A better 'bottom' line (pun intended).

They had all this raw data about their clients and their buying habits. They can mine the information (Big Data) and determine if there were any patterns. And the results were, to say the least, eye opening. 

Now, this blog is not the place to have a detailed discussion about this, but needless to say the mathematical model that was developed was successful in more the 87% to predict, based solely on buying habits, which of their clients were pregnant. They were then able to target  the pregnant customers with  coupons, flyer's, etc in hopes getting them to buy more ‘STUFF’, 

This was done without the a client filling out a form letting the company know they were expecting, Ms Jane Doe customer had yet to buy a single diaper etc. The mining of this client’s information from the company database which indicated her buying habits, was the only determining factor. 
That is what Big Data is, and what it can do.

Can you see the issues in privacy in all this? Actually, there are really three different issues when dealing with Big Data.

Is what the company doing legal?
Is it ethical?
Is it acceptable to the general public? 
Let tackle the legality first. 
It’s not a simple answer. There are a lot of variables involved. Where does the customer live? Did he/she give permission to the company to use the data collected for internal (and maybe external) use? These are but two questions that privacy officers need to deal with, address and ultimately sign off on. 

Generally speaking, we can assume, when a customer signs up for a loyalty card, there would be some form of authorization to use the data. Or at least best practices demands such sort of disclosure, if nothing else. And this may be the easiest of the three questions.

Is it ethical? 

PHD theses have been written about this very question for 'years'. There is no gov't review panel to determine if it is or not ethical, but the question is still very valid.  One education site states that ' ethics refers to standards of behavior that tell us how human beings ought to act in the many situations..'  

http://www.scu.edu/ethics/practicing/decision/framework.html
While there is no stand fast rules on what is and is not ethical, one can, if for no other reason,  look into the mirror and ask the question? Is this ok?

Is it then acceptable? 

Going back to the story above, let’s see what happened. After the store created the model, they started sending flyer's, coupons that would target the would be moms. Examples, like diapers coupons , flyer's featuring cribs etc.  were sent out to the targeted group. 

Well, you can imagine what happened next. Many irate customers wondered, first of all, how did this company know they were expecting. Even more damaging to the company’s reputation was the fact that they were sending baby oriented coupons to non pregnant clients. And what if those target accounts were teenagers, and/or single,  and/or religious?

A public relations nightmare. In fact, while doing the research, I was surprised that this had not been thought out more thoroughly in the marketing department of the company.

All these factors play in the realm of Big Data. And privacy is just one of those factors.

Ultimately, the people responsible for privacy need to assure themselves that the use of the data is within legal constraints. 

It can be more complicated if that data  being analyzed is sent out to another company. There are 'mounds' of companies whose only job is to message the data and make sense of it. They can then market to those clients with targeted campaigns  as successfully as possible(the pregnant ladies from the above example), to get the best return on the data. (the Big Data).

Big Data means being able to see trends and patterns, not determining individuals buying habits per say. 

No one in Costco cares if the individual named Robert will buy a steak or a bottle of milk. What they do care about is influencing the group that Robert ‘belongs to’ so they can somehow how influence that targeted group to buy both products (as an example).  

 So an argument concerning privacy can go something like this:

Its not the PII information of a particular person that is being used (for the most part) for this type of analysis, but that a customer bought an item and he is middle aged, 6 foot, lives in a middle class area, Etc. And he belongs to a statistical group that represents 25% of the customer base in a particular region.

Maybe. But then again is that the only usage of these great mounds of data?

The debate on Big Data, how to handle it, and the ramifications on privacy will continue. What we need to do, is have the dialog, ask the questions, figure out what can and should be done. 

The concerns won't go away, and ignoring the issues will only make it worse.  We all need to first understand the issues and then try to make 'a go at it.' And at same time making sure we don't shot ourselves in the foot.


Wednesday, May 22, 2013

Testing, in the black box (ATV), Security & Privacy



How Automate Testing Vehicles (ATV) should include Pentesting.

Why should privacy officers get involved in development, regression testing process?

Why does IT need to improve their testing strategies?

Pitfalls in Testing, Security/Privacy concerns is what drives people to have nightmares. Privacy officers need to have a better understanding of the environment they work in. The IT people need to embrace the notion that Privacy/Security starts from the beginning. So in that way the chances of being on a front page of a newspaper because of a breach and/or a failure will be minimized. NO ONE wants to phone the CIO about a problem like this. It is a team effort.

I do have to warn you, the reader, that some of the material may be a  little IT oriented. But in an organization where one needs to satisfy a number of different objectives, I would suggest at least a basic knowledge of the IT process is needed. And that the IT personnel need to understand the present compliance/regulator landscape.

Some definitions are warranted before I begin.

ATV or Automated Testing Vehicle. What is it? Why do I care? And is it a 'best practice'? (one of the most over used phrase at present).

The idea is fairly simple. Having a set of scripts (automated) that can be run to test the system in question. The objective is to test the system before any changes are implemented. The process should set up  the files that will be used for testing(see one of my previous blog posts concerning using data for testing),  then run the test scripts, and afterwards run the comparison reports and highlight items of concern from the test just executed. All this is done in an automated fashion. Rather simple concept, but one that can be 'processes' changing in a good way.

Well there is more to this. But let me define another term or two first.

IT systems that are down cost money in lost revenue, and good will to the enterprise.  As an example, in 2012 Google had an outage.
Google June 2012 down for 10 min.

The ball park figure cost that Google suffered was calculated at about $750,000. And that was for 10 minutes.  Now I am not suggesting all downtime costs are that much. It depends on the circumstances, but I am sure no one would like to find out for their own companies.

Another good example of the costs is sited at costs of web down time per industry

This site allows you to calculate the cost of a web site being down per industry/application. Its an eye opener to say the least.

In another 'word', downtime is BAD/EXPENSIVE *Yea  I know that is two words*. But joking aside we need to reduce unavailability as much as possible.

PenTesting. Wikipedia link  The Information Systems Audit and Control Association(ISACA) defines Penetration Testing as  "A test of the effectiveness of security defences through mimicking the actions of real-life attackers."

(For the reader who is more concerned with Privacy/Security, please read on)

So now let's proceed. When an application change happens IT personnel (or a designated organization) tests the changes (IE regression testing). They test the change to see if it works. Now depending on the process that is followed, a user may also test/approve the same series of changes to the application for user approval. Fine, right? Do you notice something missing in the above? In fact, there is more then one item here that needs to be defined/explored.

For many organizations testing to maintain the basic functions within an application does happen in a haphazardly way.  Sure the change is tested and to get to the enhancements, some basic functions are tested as well, But, based on my anecdotal experiences, on many occasions, the entire core functions of the changed application are not testing on a consistent bases.  A test of the all the basic core functions should also be completely tested whenever there is a change.

As an example, if the application in question is some public facing web application (a web store as an example), basic function testing should also be done. Test for example, the ability to add/change a Credit card information and make sure that the update still works. Test adding an item to the shopping cart etc.

So if the new function within the application fails, you have verified that the basic core functions, the one you need to keep the doors open, will still operate.

Imagine if an error occurs at your bank, yet the basic functions were tested successfully with the 'improved mobile bank portal' (the change that will be implemented).  Then logic would dictate that the basic functions should still work (you can still pay bills) even if the enhancement of the bank's mobile app does not. Corrections can be retested and implemented with minimal cost/embarrassment to the organization.

I am therefore advocating that there should be standard testing scripts that confirm, even with the changes that are going to be implemented,  that ALL the core functions still are accessible.

So to implement a process like this, you first need to map out the basic functions that you can not live without. Once that is done and scripts are created, an automated process should be created. When ready, a series of script can be executed with little human intervention. (less change for human error). The 'Best Practice' (there is that phase again) would be something along the lines of submitting the scripts and going home. When you get into the office the following day the results are ready for analysis/correction etc.

This should ensure that at even if the new change fails. You, the customer, can still do business with the organization in question. This is what some people call a ATV (see above). This process can be called your insurance policy.

However, lets' takes this further. Why just test  the basic functionality of the application? Should we also test for Security/Privacy issues?  Should the company's Privacy/Security office ensure that this type of testing, verification is also included within an ATV and executed whenever anything changes?

Absolutely!

A process that includes PenTesting (see above) is something one should consider adding to the above mentioned ATV. With any change there is always a chance that a vulnerability is created that may not have been there before.

Any failure can by it's very nature, cause the potential to expose sensitive information. It can be business secrets, and/or Personnel Identifiable Information (PII) to name but two potential headaches.

There is software in the marketplace that has the capability to engage/test/analyze applications for vulnerabilities. Some of the software I have previously mentioned as well as others which are available with the capabilities needed.

So I suggest that one creates an ATV process that includes the basic functionality of the application/system in question as well as additional testing for security/privacy. All  this should be automated so that more extensive testing can be executed as well as reducing the chance for human error.

Privacy officers need to ensure that any changes that are implemented will not cause exposure that may be costly. IT people need to make sure that the basic systems functions still run, no matter what is changed.

Finally, while no one can claim in absolute terms that there will be no issues, following these basic concepts can help reduce the chance that the CIO needs to be called because of an issue.











Monday, May 6, 2013

Privacy for IT, Security for PO, Privacy by Design PdB.







So far I have tried to tackle how different professionals look at privacy differently and how stakeholders are an important piece of the pie

What I am going to try to address within this post is how technical ideas affect privacy and security, as well.

I will also attempt to provide some guidance concerning some of the issues I will discuss here.

Please  note, I have no relationships with any of the companies that I mention here, or any in any other posts that I have written. Also, it is up to the reader to do their own due diligence.

Now, the reader may have some level of knowledge of the 'tecky' stuff but I will try not to make any assumptions. What I want to do is to highlight some aspects, describe them for those who may not be as technically inclined, and provide some resources where more research can be done.

Some lay people use the words security and privacy interchangeable. While security is needed to maintain privacy, it can mean other things as well. For example, physical security of a public facing office (banks, insurance agents offices etc) is generally accepted that it need to be addressed,  to protect the employees (non privacy issue) and protect the companies customers from data breaches, which is a privacy concern.

What I am going to deal with here is security that is needed to protect Personal Identifiable Information (PII)

So lets get started.

Security

Hopefully, when a developer starts coding for a new application, or making enhancements to an existing application, he/she will know how to code to prevent security holes within the code. But as we all know, we are all human.

SO what can we do?

A new type of software is emerging that can help developers to highlight what they should be coding. This is in a form of questions/guidance that can be based on questions/queries from a knowledge base. The objective is to build into the design document (this is the document that concern how the programs work together and coded, given the requirements of the application being worked on). This would then place into the design document specifications of the required defences that need to be incorporated within the code.

The two software products that I am aware that falls within this category are:

1) SD Elements (http://www.sdelements.com)

2) Security Innovations (https://www.securityinnovation.com)

Both have there strength and weaknesses. They also tackle this aspect of security coding in a very different way.

As an analogy, let us use the example of your car (or your friends, car if you don't have one <S>), or boat, bike etc. Which is cheaper? Is it changing your oil every x KM/Miles, or waiting for the engine to seize when the oil can no longer do its job?

On average it costs about $4,000 to fix a vulnerability in an application (SD Elements). According to White Hat Security (https://www.whitehatsec.com/resource/stats.html) on average, there are 56 vulnerabilities per website (2012). So let's do some math, Shall we?

It will cost $4,000 times 56 on average to fix all the problems with security on a public facing websites, for a total of, and average of $224,000.

You can close your mouth now.

And to top it all off 85% of all websites White Hat tested had one vulnerability. And to make matters worse, it took, on average, 193 days from the date the issue was detected until it was resolved. Never mind that 61% of the White Hat tested websites that had vulnerabilities were never fixed in the first place.

In other words, the best practices, as well as the ROI,  demand that we need to try to nip this issue in the bud. It follows that company's policy should have security requirements and processes be part of the design phase of any project.

Privacy

At this point let me highlight a series of documents, white papers that have been produced by the Information & Privacy Commissioner of Ontario Canada. (IPCO) Dr Ann Cavoukian PH. D.

The premise advocated by the IPCO is that of Privacy by Design (PbD). It goes in to much more depth that is beyond the scope of this blog but I encourage you to head over there and explore.

There are two sides to the equation. Security for the professional IT people and Privacy for the legal 'minds'. How in essence they are complementary and how they must exists together.

As a note here, one of the white papers on the sir 'Privacy and Security by Design: A convergence of Paradigms' talks about what I am writing about here. It was released in Jan 2012.

I do have to make an admission to the reader. I started writing  these blogs, and this one in particular, before I had any notion of this white paper's existence. When i did discover the PbD white papers i realized the concepts, topics, and themes were similar to the issues I have explored in my blogs,

I will continue along this road next time. I will highlight examples of different forms of testing for security and ideas of privacy.