ParStream and the edge

ParStream is both the name of the company, which received series A funding in 2011, and the company’s database ParStream DB. As mentioned in my previous article (Close to the edge) this was specifically designed for edge analytics within Internet of Things (IoT) deployments. It consists of the database itself, about which more shortly, and a number of front-end capabilities to support analytics.

The product has an MPP (massively parallel processing), shared nothing architecture and uses a scale-out methodology. Metadata is held on every server so there is no master node and queries are moved to where the data is rather than the other way around. As a result, the company claims to be able to support the processing of upwards of a million records per second on a single server (five million on five servers) through its streaming and in-memory capabilities. It also provides geographically distributed functionality so that you can have multiple implementations in multiple places that all link together.

From an analytic perspective the product uses SQL and a major differentiation between ParStream and conventional streaming analytics platforms is that, because data is persisted, you can run queries across time ranges. For example, how often did xyz occur between these points in time? You can also, of course, do things like real-time trending. However, the most common requirement is for anomaly detection and alerting, which is achieved by comparing event data with historic data.

So, how does ParStream achieve this? The key is in several parts. The first is that it uses highly compressed bit-mapped indexes with “hot” indexes being cached in-memory. The second is that there is no decompression required when you group by, sort, calculate minimum and maximum values, and so on. Thirdly, ParStream is lockless. And by this I do not mean that it simply implements optimistic locking. In fact, there is no such thing in ParStream as a single row update or delete. Because the database isn’t intended to support transaction processing but only event data which cannot be changed (it happened: period) the smallest element within ParStream DB is a partition. The end result is that write do not impact on reads and vice versa.

ParStream has partnered with various vendors for visualisation purposes including ThingWorx and Datawatch (Panopticon). There are facilities to develop your own analytic algorithms and the company already supports Knime. R support will be introduced in the near future.

The other notable thing about ParStream DB is that it only has a 40MB footprint and the company has recently launched an Intel-based appliance (a true appliance: pre-installed, pre-configured, ready to go) called the EdgeAnalyticsBox specifically to support processing at the edge.

Finally, it is worth commenting that despite being a relatively small company ParStream already has customers (and significant customers at that) all over the world. The company looks well placed to capitalise on the forthcoming growth in the IoT.