In the first part of this piece, I raised some issues around data protection legislation and Big Data. Now I want to get some expert advice—as anyone starting a Big Data initiative should.
I asked Philip Howard of Bloor whether he'd encountered such issues in his Big Data practice and he noted that Facebook data was being made available for mining by third parties (see here) and he wondered whether Facebook or its prospective customers had considered the compliance and reputation risk associated with the latest EU data directive proposals (you can find a overview guide to these from Robert Bond, Partner; Alexia Zuber, Solicitor; Dominika Kupczyk, Data Protection Executive; from Speechly Bircham's IP, Technology and Commercial Group, here; registration details needed).
I guess that the answer, for Facebook, is "yes"—and that it didn't much like what it found—because it suggests changing the law here. As for its prospective customers, who knows?
I've been raising questions in these pieces so far and avoiding giving answers. If these questions might concern you, you should seek your own answers, in the context of your specific circumstances. This doesn't mean finding someone on the web who endorses whatever your current practice is or promises you a "get out of gaol free card" (gaol? let's not go there); nor does it involve asking your local database support technician about data protection. Neither "what the blogoshere knows" nor the opinions of DBAs or web techies on compliance law rate very highly in the courts. You need to actually read the directives and so on yourself, talk to your compliance people and get input from your legal advisers.
In that spirit, I talked to Bloor's compliance specialist, Peter Howes of Rite-Choice Ltd. He warns that the Article 29 working group issued a report earlier this month which definitely references big data—see "Annex 2: Big data and open data" on page 45 of the report here. He also suggests taking proper legal advice but says, "my take on this (as another person who is not legally qualified) is that, whilst there is an issue when you get to the Business Analytics and Intelligence, it is also a problem when the data is initially captured; even if the full scope of the details of analysis and intelligence inference are not fully defined at the time of capture". And, he confirms that, "as you point out, this is not a widely understood problem yet".
Peter also expands on the Facebook issue which arises, in part, from the fact that, when the Data Subject (i.e. the Facebook customer) is in Europe, Facebook (et al) will in future have to comply with the European legislation wherever their data centres are located. "The main reason why the American companies are so worried is because of the provisions in the expected replacement legislation that are broadly referred to as the "Right to be Forgotten"", he says.
"This 'Right to be Forgotten'," he says, "will have a major bearing on "Big Data" inside EU as well as for US based service providers with the replacement Data Protection legislation and should be considered now and accommodated in the solution design (unless the organisation deploying a "Big Data" scheme including personal data intends that the solution to only have a 2 or 3 year life)".
Ah—another exemption! So that's all right then? Well, no, as I implied before, relying on EU data protection exemptions is not really recommended—it is often cheaper and easier (and safer) to just comply anyway. This one is particularly risky: how do you prove that personal data will only be kept for 2–3 years and do you have tested policies and procedures to ensure that its retention isn't extended, that none of it is copied to other systems with different policies, that none of it is kept on local hard-drives or that "useful reports" on paper aren't kept past the limit? And what about backups and data retained for long periods for compliance purposes? As Peter points out: "no way would anyone today plan for that short a life or reasonably expect such a quick termination. And, if they did, they should expect the content to be somewhere in the organisation afterwards (not just the unauthorised retention, but also the normally retained backup)".
Luckily, I know some lawyers too. Robert Bond is a Partner and Notary Public for Speechly Bircham LLP and a noted data protection expert. He says that "data protection laws apply to personal data in storage as well as in other uses such as analysis, research, sharing and transfer. An individual may impliedly consent to the use of personal data for purposes for which they reasonably anticipated use but not for unanticipated uses particularly profiling. In any event, if any personal data contains sensitive information such as health then consent needs to be more expressed than implied."
So, what that all means is that if you are starting a Big Data project you need to think about what data you are collecting, what uses you might want to put it to; and, if it is collected from or about people, whether you need their consent (either implicit or complied) before storing it, let alone using it. You need to become familiar with data protection legislation (both as it is now and as it is expected to evolve) and, probably, pay for legal advice on whether it impacts you, and how. Think about the cost of not complying, if you are caught—not only fines, but reputation risk; and the regulators may see you as a likely "useful example" when the law next changes or a different regulation becomes high-profile. Then you have to estimate the cost of complying with data protection law (not complying shouldn't be considered as an option)—including the cost of finding out about it—and make sure this is included in the ROI estimates for your Big Data project.
But look on the bright side. If you are thinking of collecting and storing vast amounts of data, using new and comparatively untested technology, with no very clear idea of what you'll use it for, or when, or for how long—isn't that looking like a very risky project? A bit of due diligence now, spurred on by some of the data protection issues discussed in my two papers, may focus you on why you are jumping on the big data bandwagon at all, what resources it is using and what the business outcomes are, that might justify the Big Data adventure. And that must be a good thing.