The discussions about Big Data and the analytics that surround it have tended to be dominated by Hadoop and MapReduce, and for the emphasis to be placed on the 'big' aspect. However, we have been doing big for a long time, in Telco CDR analysis, in Retail Banking transaction analysis, and in Retail ePOS analysis, which are all examples of big by any definition. But when you take Big to really mean diverse then it starts to become really interesting and you start to see things that are truly big changes in paradigm and not just a form of natural progression and realigned economics.
For me the truly disruptive technology advance that has been made is the analysis of text based data, rather than numeric data. Numbers make analysis easy, they have a universally agreed context and their magnitude is understood universally. But words are not like that; context is going to change the meaning that can be derived from a word, for instance just think of what the word "wicked" means to me and to my teenage daughter.
Not so long ago we started to mine text and, at the time, this provided really interesting results straight away. At that time I worked for HP and for a SAS SEUGI we used SAS text mining to look at engineer and customer feedback notes in the call centre, and from the install notes, for a number of newly released products, and were surprised to find that we discovered a number of product issues, before the product engineers became aware of them.
The progress in analysing words has been staggering; the most mind boggling being the IBM Watson project winning a quiz show in the US. This is analysis of speech combined with detection of context and data retrieval in real time, and it was only a few years ago that I was being amazed at technology that merely allowed speech recognition to be conducted at faster than real time.
Of all of the uses of text analysis, one that I believe has the most obvious ability to deliver a real boost to the bottom line is that of social media analysis. Over the last few years I have become increasingly aware that, on many of the projects I have been involved in using customer surveys, it is obvious that the panels are providing answers at odds with the behaviour that they actually display, and that they are providing the answers that they think you want to hear rather than what they really think, and the younger the audience the more sophisticated they appear to be in answering in ways that tell you anything but the truth.
The killer app of big data is the detailed analysis of customer behaviour, and one of the richest sources of information about what people really think is to be found in social media, be that Twitter, Facebook or YouTube etc. When it comes to the analysis of such data, it is not enough to just have clever technology - you also need the skills that come from a deep knowledge and experience of analytics and data handling.
The quality of analytics for social media is going to be fundamentally based upon an ability to collect data effectively, to process data effectively and to present results in ways that suits the needs of the audience, allied to an absolute, that the quality of the analytics is fundamental, simplistic analytics will provide simplistic results. When it comes to doing all of these things well, the market leaders are going to be the established players with the track record of doing these things well over the years and who can provide the technology and the know how to deliver reliable results. Here SAS remain a market leader, not through scale or longevity but through expertise and excellence.
Text analytics requires careful tuning, you have to be able to identify the critical things that you want to look at and separate them from the noise that surrounds them. A taxonomy gives structure to what is seemingly unstructured data, as nouns are hierarchically associated with topics, and then verbs and adjectives associated with those nouns can be analysed for sentiment.
As the high street and the shopping mall is increasingly supplemented by the internet, and a whole ecosystem builds up around the likes of Facebook, with its ability to build self selecting audiences of interested parties and a direct channel to them, such analysis will be the battleground for commercial success. Winners and losers will be determined by a detailed analysis of how the sentiment evolves over time in response to different stimuli. The analysis of that data must be reliable, timely and feed back to the audience in a readily assimilable format, and as that data is the lifeblood of an organisation that means that all functions from marketing to logistics will want to know what is going on, each with their own spin on what and how they want the data to be presented. Whilst there will be many point solutions offered to this growing market there will only be a handful of vendors capable of offering everything that is needed and I, for one, would bet that SAS will be leading that group.
The important thing is to remember that, although I have labelled this as social media analysis, the technology is applicable to all sources of textual data; so that includes call centre note fields, customer emails - all valuable sources of insight and that insight can be used to create market opportunities, save wasted expenditure, improve the customer experience and so forth. This is why Big Data gets people excited and this is why the traditional experts, like SAS, continue to have a compelling argument for inclusion in any work in this arena.