CEP and big data

Red Lambda recently came to market with a new SIEM (security, information and event management) product. Its distinguishing feature is that it has a CEP (complex event processing) engine at the front-end because, in the company’s own words, “log and security data is a big data problem”. I have been preaching this, not in so many words, for the last couple of years but the only other SIEM vendor that I know of that uses CEP is Tier-3, though I also know that SAS has been doing work in this area (for example, for real-time identification of ‘low and slow’ attacks, which is notoriously difficult).

More generally, the CEP market segment is characterised by two use cases: supporting algorithmic trading and other functions within capital markets, and supporting business process management (BPM) and improvement within SOA (service oriented architecture) environments. So Progress, Streambase and Sybase are in the former camp while Oracle, Progress (again) and TIBCO are in the latter group, along with IBM’s WebSphere Business Events.

What is surprising to me is that, with the exception of IBM, none of the other vendors that have long-established CEP products seems to have recognised the wider truth in Red Lambda’s statement: that CEP is the solution to many of the issues raised by big data, especially where that is instrumented data arising from sensors, logs (web, security or otherwise), smart meters, RFID, GPS or similar. That’s not to say that they haven’t dabbled with employing their products in other environments but they certainly aren’t, at least to my knowledge, targeting ‘big data’ as a generic issue.

The key, to my mind, is to be able to support data mining techniques to build a relevant model (or models) and then score incoming events against that model in real-time. For example, Red Lambda has implemented a data mining technique that it calls Neural Foam, which is like a neural network except that it doesn’t require training (note: in case you are not into data mining this is a seriously important feature). More generally, you would want to have support for PMML (predictive modelling mark-up language), which is the industry standard for porting data mining models: so you can build a suitable model in your warehouse, port it to your CEP engine and then, as I say, score in real-time.

This is what IBM’s InfoSphere Streams does and, given Oracle and TIBCO’s presence in the business intelligence space (in the latter case with Spotfire), then you might think that they would do so too. In particular, by continuing to focus on CEP as a corollary to BPM and SOA they are leaving the field wide open for IBM. In the case of Oracle this seems very un-Larry-like.

More particularly, IBM is implementing Streams to augment its data warehousing offerings. Not only are there Streams installations in conjunction with DB2, but IBM already has Streams being used in conjunction with Netezza. In other words, it is easy to conceive of situations where Streams might make the difference between a customer licensing Exadata and/or the Oracle Big Data Appliance on the one hand and an IBM database or BigInsights (which integrates with Streams) solution on the other. As I said, most un-Larry-like to hand a major competitor such an advantage. Still, I’m sure that IBM is happy enough to have the playing field to itself.

Finally, just to return to SIEM for a moment: what’s the betting that Streams will be integrated with QRadar, which IBM has gained with its acquisition of Q1 Labs? If anyone wants to place a bet to the contrary, please let me know: I’ll be happy to take your money. And that doesn’t spell good news for any of the other SIEM vendors out there: Tier-3 and Red Lambda may be in the game but everyone else (unless they’re quick) is going to be left with an out-of-date architecture. But then, I think that’s where they are already.