Close to the edge is, of course, the title track of one of Yes' best-selling albums. However, it is also an apt description of where analytic processing needs to be within many Internet of Things (IoT) environments.
The argument is this: imagine you are a mobile phone operator and that you want to identify dropped calls, both for network efficiency purposes and to reduce customer churn. In the normal run of events you stream all of your data from your phone masts back to some central location where all the data gets processed and analysed. The problems with this scenario are that a) there is an awful lot of data moving across the network (which has both a performance and cost impact); b) that you need a very large, heavy duty, expensive processing platform centrally to process all of the data; and c) there are latency issues because of the all movement of the data and its subsequent processing.
Now consider the alternative: you install analytic engines at each mast. Each of these processes the local data looking for dropped calls and, when it finds one, relays the relevant information to your central location. In this scenario you have more analytic processors but they are each much smaller than when processing is centralised. You have also reduced network traffic and cost, and you have reduced or removed any latency.
So, you need analytics at the edge. But what sort of functionality do you need in an edge analytics platform? Well, that will depend on the application. You may just need to support streaming analytics. However, in many cases that will not be sufficient. You may need a database as well. There are a couple of reasons why that might be so. The first is because a part of what you are doing involved trend analysis and you can't do trend analysis unless you have historic data which, needless to say, needs to be stored in a database. The second reason is to do with anomaly detection, which I need to discuss further.
There are three ways to do anomaly detection in real-time. The first is to use a complex event processing (CEP) that is rules-based. The problem with this is that the environment can become overly complicated: when you have hundreds of sensors on a single device then the number of rules required for anomaly detection becomes very high very quickly, with all that implies in terms of both initial development and maintenance. The second method is to use an event streaming platform that is not rules-based. Here you would typically either use predictive analytics via something like a PMML (predictive modelling mark-up language) model or you can run relevant SQL (or SQL-like) queries. The former implies that you have captured lots of historic information and analysed it centrally but is essentially about prediction rather than simply detecting anomalies, while the latter assumes that you know what anomalies look like and have written the relevant queries: but that's exactly the same problem as with rules.
The third method is to use streaming in conjunction with a database. This allows you to compare what has just happened with what has happened in the past to detect anomalies and will, of course, also support things such as predictive analytics. However, there are technical requirements of the database, in terms of performance and scalability, which means that there are very few products that are capable of combining database and streaming data in real-time. Probably Sybase (SAP) IQ comes closest out of the mainstream vendors to achieving this but I know of only three companies that can really do this well: Kdb+ from Kx Systems (see http://www.bloorresearch.com/research/indetail/kdb-internet-of-things-big-data/), Ancelus, from Time Compression Strategies Corporation (see http://www.bloorresearch.com/research/indetail/ancelus/), and ParStream. Of these three, Kx has historically been focused on capital markets but has recently been moving into the IoT space, while Ancelus is focused on manufacturing and six sigma, though no doubt it will be focusing on Industry 4.0 in the future. ParStream, on the other hand was designed specifically for edge computing and the IoT. As I have written previously about the first two of these but not the third, I will be a writing a follow-up article to this specifically about ParStream.