Considering the small in big data

The big vendors will tell you that big data is all about the three Vs: volume, velocity and variety not to mention other sorts of V like value and veracity. They are so keen on Vs I’m surprised they don’t use posters of a Guy Fawkes mask.

To be fair, there is some truth behind the Vs but it’s not the whole truth, particularly when it comes to variety. After all, combining data that is in different formats and from different sources is hardly a new requirement and you don’t need to go anywhere near Hadoop or anything else associated with big data to resolve it. Well, let me qualify that: if you want to do the sort of things that data scientists are supposed to be doing then OK, but if you all you want to do is to combine data from multiple ERP implementations into a single reporting environment then you don’t need to go as far as any sort of big data solution. And the same applies to combing quantitative data from a PDF document with data from a spreadsheet, or reporting that spans both SaaS and in-house applications.

For some such types of enquiry you can use data virtualisation but it will also often be the case that you don’t want to get into that sort of technology, especially if you are operating outside of a data warehousing environment. An alternative approach is that you might use the sort of capabilities that Datawatch offers. This actually isn’t a million miles away from data virtualisation, at least in the sense that you start by building a model of how all the data fits together, but then you physically move it (there are built-in facilities to do that automatically) and now you can run any number of queries against that data.

Datawatch offers an interesting alternative approach. I’m going to be speaking at a webinar they are hosting on 23rd October. If you are interested, here’s the link: http://at.datawatch.com/webcast-BloorResearch-102313?mc=BloorWeb.