<< May 2012 >>
SuMoTuWeThFrSa
  12345
6789101112
13141516
17
1819
20212223242526
2728293031  

Further Information
If you are interested in any product or service from Bloor:

Home > Recent Analysis > Analysis

BI for hybrid (big) data

Philip Howard

Written By: Philip Howard
Published: 14 June, 2011
Content Copyright © 2011 Bloor

Most companies exploring the use of big data for business intelligence purposes do not simply want to analyse unstructured data, they also want to combine the results of that analysis with relevant structured data. They want their analytics to span all sorts of data, which we may refer to as hybrid data.

Unfortunately, NoSQL data stores are not really suitable for storing (and therefore analysing) structured data while conventional data warehouses are not very good at storing (and therefore analysing) unstructured data. As a result, the architecture that is emerging from a data warehousing perspective is that you store unstructured data in something like Hadoop, do basic analysis work on that data and create summary information that can be passed to the formal data warehouse, where that information can be further analysed. You can either do this through direct integration between the different environments or by means of a federated query environment (such as Composite Software’s) that supports Hadoop.

For large organisations this approach makes sense, but smaller companies with smaller budgets may have an issue with such a potentially expensive solution. One alternative is to store all the information in a warehouse such as that provided by Aster Data (Teradata) or Greenplum (EMC), which support native MapReduce capabilities. However, there are potential scalability issues if you try and do this. The real problem is that conventional BI tools do not support the analysis of both structured and unstructured data within the same query—which is what you would really like to do. Instead, you have to use MapReduce on the one hand and some SQL-based tool on the other.

However, that does not mean that suitable hybrid tools do not exist. In particular, Endeca Latitude and Connexica (previously ArdentiaSearch) CXAIR, both support query capabilities that span structured and unstructured data. The two products have different implementations but the same basic philosophy, which is to extract structure from unstructured data and can then combine that with directly structured data, by means of indexes (search-based indexes not database style indexes). Both products are very easy to use (and special emphasis is placed by both companies on how easy it is for end users) and both have a focus on allowing users to explore the data rather than just reporting on it.

However,  they are rather different  when it comes to their approach to the market. Specifically, Latitude is aimed at companies that want to develop analytic applications to support exploration of hybrid data while CXAIR (which stands for ConneXica Ad-hoc Interactive Reporting) is more aimed at the traditional BI market, albeit that the product is being OEM’d by a number of third parties that have embedded the tool in their own products (in place of, for example, Crystal Reports). I expect to be writing more about Latitude and CXAIR in the future but to go back to my initial point it seems there is no one-size-fits-all solution to the problem of how to provide BI that spans hybrid data.

There is clearly a choice of warehousing architectures and, no doubt, the leading BI vendors will bolt on unstructured capabilities that will compete with the built-for-purpose technologies from Endeca and Connexica. Quite how all this plays out remains to be seen but if you are interested in hybrid-structured BI right now you should check out Latitude and CXAIR.

Reader Comments

There have been no reader comments.

To prevent spam, we ask that you register for a free account and then log in to post a comment.
All comments are moderated and will only be published if deemed appropriate by the site editor.