Big data summary

I wasn’t originally intending to write a sixth article in this series but I think it is worth a brief, formal wrap-up. I have discussed context, trust, security, ethics and integration but the overarching message is that big data environments in many ways mirror conventional data environments: they are not different in substance only in kind. However, users have to have confidence both in the data itself and the conclusions that are drawn from analysing that data: otherwise actions that should be taken will not be taken.

Having confidence means trusting that the quality of the data is sound and that the context within which the data has been analysed is also valid. In other words you need data governance just as much for big data as for conventional data. And, while some aspects of that governance will be simpler than it is for transactional data, the environment as a whole will be more complex (different strokes for different data folks) and will require a more agile approach. And the same applies to the integration environment, where multiple different approaches will be required depending on what is to be integrated.

Now, I don’t mean to put you off. There are clearly lots of benefits to be derived from big data. However, it does need to be thought through: it is not simply a question of setting up a Hadoop cluster, throwing some data at it and waiting for the data scientists to tell you what to do—who, of course, will be infallibly correct. If this was the right approach they would be better called data magicians. They aren’t and the actual truth is that “scientist” in this context is a misnomer: what they do is much of an art as it is a science and it needs to be treated in that context, with governance, management and understanding.