Streaming Analytics Platforms
Analyst Coverage: Philip Howard
Streaming analytics is a branch of analytics that is required when a) analysis is required in real-time and b) the volume of events or transactions is so large that it cannot be effectively handled using conventional technology. Most commonly (but not always – it depends on volume) the data is processed prior to it being stored (so that the delays associated with database update and query are avoided); or the data may never be stored; or, possibly, only aggregated information is stored.
A streaming analytics platform is therefore an environment that provides suitable performance to process anything from tens of thousands of events per second to millions per second, depending on the platform.
Historically, streaming analytics platforms are a development of complex event processing. This effectively provided the same sorts of capability but targeted at algorithmic trading and similar environments within capital markets. What has happened over the last few years is that the technology has become more oriented towards query processing, especially in the light of big data and the Internet of Things.
Traditional query techniques involve storing the data and then running a query against that data. However, the process of ingesting and then storing the data takes time and when there are very large amounts of data to be processed and the query latency requirements are very low then the overhead involved in landing the data in a database, and then running a database query is too great. Streaming analytics works by having the data pass through a query during the ingestion process, thereby providing much better performance.
However, it is not always as simple as just passing the data through a query. It may be more a question of pattern recognition, whereby a series of correlated events together meet or fail to meet an expected pattern. For example, credit card fraud detection is a common use case for this technology and the same is true for the identification of “low and slow” attacks against corporate infrastructures.
Streaming analytics is about real-time analysis of large volumes of data. There are lots of potential use cases in telecommunications, smart meters and monitoring applications of various sorts, fraud detection and so on, so it is often of interest to governance and control departments.
Some solutions in this space (those that can store data) may be able to support functions such as real-time trending, which will be useful in some environments.
For a long time complex event processing was looking for significant numbers of use cases outside capital markets. The advent of big data and the Internet of Things has provided just such opportunities. Thus the main trend in the market is actually the shift away from complex event processing and towards streaming analytics. It is also notable that there are a number of NoSQL initiatives around streaming: for example, Apache Spark.
The early leaders in the complex event processing space have largely been acquired by the major vendors. However, newer companies such as SQLStream have emerged to challenge the hegemony of the 800lb gorillas. Nevertheless, it is likely that this market will be dominated by the major players.