Consuming big data, Advizor and EMC Greenplum

There are many who still view Big Data as no more than “snake oil”, another spin from an industry desperately seeking “the next big thing” and, to a certain extent, the hype does seem, at times, to run ahead of the reality. It therefore comes as a refreshing contrast to listen to a webinar, clearly rooted in the here and now, illustrating, with real world examples, not only what people are using big data for, but how, with Advizor, those results are being presented. The reason that this is so important is that, even with an audience that was very largely based in the North Americas, where adoption is further along than in EMEA, people are still wary of the cost of getting involved in Big Data, the skills that are required to set up the infrastructure, to then run the queries, to then interpret the results, and turn insight into actionable profitable results. What this webinar did was to show that Big Data is not some mysterious esoteric step into the unknown – instead it is quite clearly the next logical step in our quest to understand multivariate issues, which, if we can get better answers to the questions we ask every day, will address so many of our concerns; be that from churn reduction in a Telco, to reducing risk in an insurance company, to understanding market trends in the Financial markets, to outcomes from illness or genetics, to when people will buy dips to go with their potato chips.

The key message that Advizor and Greenplum are making are that there are 4 key technologies that impact our ability to consume big data in ways that are affordable, accessible and actionable; namely In Memory analysis to speed the return of results, Visualisation to aid the ingestion of what would be impenetrable patterns if reported on in tabular data, Social software to extend the reach of our understanding beyond just transaction history, and Search to change the way that we find what is important. These, alongside the commoditisation of the technology used to reduce the cost and make the ROI of getting into Big Data compelling, are, to me, the cornerstones of what Big Data and Agile BI, the two things that seem to be creating interest at present, are about.

When EMC invested in Greenplum they were buying into a platform that can handle the volume of data that can now be aggregated from all of the sources of insight available today, which, together, represent the volume that puts the ‘big’ into Big data. Critically, the Greenplum platform is based on commodity technology, the same Intel chips that are used in PCs. Above that they have the Greenplum and Hadoop data stores to handle the data, which is exposed by a data access and query layer to a layer of tools and services that will open up the data to exploitation and the creation of value. Here is where Greenplum’s innovation is really clever; they have their Catalytics programme to work with partners to build the tools and services required, and they provide Chorus, a collaborative analytic productivity layer, to bring all of those parts together and expose them to the business audience in a manageable way.

Advizor is one of those partners within the Catalytics programme. I have been aware of Advizor for some time because, like Doug Cogswell of Advizor, I have been a long time fan of KXEN as a data mining tool. KXEN, which is embedded into Advizor, provides the power of an SPSS or SAS, but in way that allows a non statistician to build a predictive model just as good as the traditional tools, but without the need for a PhD to get there. Advizor’s role in the Big Data story is to take the results from the lower level tools, for instance like a sentiment engine analysing Tweets to see if people are positive, indifferent or negative about a subject, and present the results in a visual way that allows a business user, such as a Product Manager in the FMCG market, to understand what the big data is telling them. Again, I find that many people are still not comfortable with what Visualisation is really about, and we all know that fear of the unknown tends to inhibit adoption. But this is why webinars like this are so valuable because you can actually see the tools in action with real world data and see the results, which can, for instance, use colour coding to represent sentiment and the size of a block to represent influence, enabling a complex interaction of factors to be presented in a way that conveys real value for ready consumption.

I can really recommend anyone looking to make their first steps in Big Data to check out both Greenplum and Advizor, and this webinar, which is now posted on the Advizor website, is a great place to start.