Big data is all the rage. While we will address the question of what big data is, the real question is how it differs from the traditional world of analytics and data warehousing that we were familiar with just a couple of years ago. Historically, data warehouses catered for data that was originally transactional. As companies started to want to analyse other forms of data, such as clickstream data, these were converted into relational format (they are not intrinsically relational) so that they could be stored and analysed with a conventional warehouse. However, as the world has become increasingly instrumented with sensors, monitors and other devices, the need to store such data has ballooned.
At the same time, companies have recognised that there is significant value to be obtained from analysing unstructured data (usually text but also video and audio) such as comments on message boards, tweets and the like. The problem with analysing this sort of data is that it is not easy to put it into relational format for analytic purposes: search is about the most you can hope for. Further, there is so much of this data that it would be prohibitively expensive to store it in a conventional manner. As a result of these considerations, companies like Google and Yahoo! developed new methods for storing and analysing this sort of data. As it turns out, these methods are also suitable for storing and analysing the instrumented data discussed in the previous paragraph.
While there are many options, the most popular method for handling big data is Hadoop. However, Hadoop has limitations: for example, it does not support ad hoc enquiries, it is not especially fast and it lacks enterprise-class data management capabilities. It is, on the other hand, much more cost-effective at managing non-relational data than a conventional data warehouse. For all of these reasons enterprises are looking at how they can combine Hadoop with a traditional data warehouse. In this paper we will not only discuss what big data is and why you might want to deploy big data analytics but also what sort of facilities you need when combining Hadoop with a traditional warehouse and, specifically, what facilities Sybase (an SAP company) offers in this respect.