Continuous Business Intelligence and Big Data - Using CEP for real-time continuous queries

Written By:
Published:
Content Copyright © 2011 Bloor. All Rights Reserved.
Also posted on: Accessibility

I have written previously about how the major vendors of CEP (complex event processing) solutions are still primarily focused on either capital markets or supporting business process and SOA and are not really focusing on business intelligence and analytics per se. The exceptions are IBM and StreamBase. However, this is by no means true when it comes to the operational and big data markets. In the former area, Tier-3 and Red Lambda both use CEP for SIEM (security information and event management) while AccelOps employs it for data centre monitoring.

However, it is on big data more generally that I want to focus. Here there are a number of new and not so new ventures that provide what is becoming known as “continuous BI”. First, it is worth explaining what continuous BI is. Normally, when you run a query that’s it: it’s a one-time shot. If you want to run the same query again then you have to activate it again. A continuous query, on the other hand, is one that you fire up and, once instigated, it continues to run until you tell it not to. What happens is that the results of the query change as new data passes through the query. It is, if you like, a sort of sophisticated real-time monitoring that doesn’t simply record what has happened but supports complex analysis and pattern matching (and, of course, simpler things) against the incoming data.

Now, this is typically a big data problem but it won’t work to simply load the data into Hadoop and then query it. That won’t be continuous. So you need some sort of CEP engine to do it for you. Four such products are (or will be, they are not all commercially available yet) S4 from Yahoo!, which is part of the Apache Incubator programme; Storm from BackType Technology, which describes its product as “The Hadoop of Real-time Processing”; DarkStar from Cloud Event Processing, which provides CEP-based streaming MapReduce; and HStreaming (from the company of the same name). As its company name implies, DarkStar is a cloud-based and HStreaming also offers a cloud-based version.

No doubt all of these products have their merits. However, I have two issues with all of them. Firstly, as far as I can tell (their web sites are singularly uninformative) these are all platforms and unless you are a geek you don’t want a platform for BI, you want a solution: you want a user interface where business people can define queries either for prolonged use or on ad hoc basis, and you want visualisation and graphics that can display the results of those queries to them. What these products all offer (and the same is true for IBM Streams) is a platform on which developers can build query capabilities. But I don’t think that’s enough and it’s certainly not what end-users want. The only company that has actually built this sort of interface (or, more precisely, is starting to build it: this is only version 1) on top of a CEP platform, as far as I know, is StreamBase, with its recently released LiveView product. I’ll discuss this in detail in a separate posting but suffice it to say that it is a mile ahead of all these other vendors in offering continuous BI in so far as I can tell from their web sites (and, yes, I need to do some more research and, yes, I will report back having done so).

The second problem I have with these vendors is that yes it is a good idea and yes there is gap in the market but I don’t think that gap in the market will last long. Certainly, I would expect IBM to integrate Cognos on top of Streams, SAS will be entering the market next year, it would reasonable to expect Sybase Aleri to be integrated with Business Objects, TIBCO could do the same thing with Spotfire and obviously Oracle also has that capability. None of this means that there won’t be a market for open source and smaller suppliers but they are going to have to be careful about where they put their marketing dollars. One can imagine that Pentaho and JasperSoft might be interested in integrating with, and potentially acquiring, these vendors. No doubt an ISV market will also emerge but this may take some time.

As with all things associated with big data, continuous BI is an immature market and we will have to wait to see what happens but I expect the big boys to jump on this with both feet.