System S

I have previously discussed why I think that System S is unique. Now I want to put some meat on the bones.

System S has been the name of an IBM research project for some years and last September the company announced that it would be releasing this, under the name of InfoSphere Streams, in 2010. However, work with initial users has persuaded the company to release the product now, as a part of the InfoSphere product line, albeit only on Intel platforms. Ultimately it will run on all sorts of platforms, including IBM supercomputers.

System S is aimed at environments where there are extreme requirements for high volume input, low latency and complexity though any two of these in combination may suffice. And when I say “extreme” I really mean it—in this release you can run System S on a cluster of up to 125 servers with 8 cores per server—you don’t need much imagination to understand what sort of performance you can get from a 1,000 core system.

There are a number of key features of the system, including SPADE (stream processing application development environment) which is based on a purpose-designed scripting language for handling streaming or high volume event data; a monitoring capability so that you can view streams; the use of IBM solidDB as an in-memory database (which IBM reports as being an order of magnitude faster than using DB2 with in-memory cache for relevant applications); and the ability to assign system resources by query type (a sort of mixed query workload capability) so that, for example, you can have specialised blades dedicated for video streaming, or FPGAs, where appropriate, and so on.

Of course, this is a first release so there are some imperfections in the System S offering. For example, there is no fault tolerance yet, though there is state monitoring so that you can roll-back to that state and continue running on another node from that point. There is also no integration with WebSphere Premises Server, which is the company’s platform for processing and filtering sensor data and no non-IBM adapters are available other than ODBC, though support for non-IBM message buses is planned. Perhaps the biggest weakness will be in environments where you need to combine complex historic analytics with incoming data. You can use DB2 or IDS in conjunction with solidDB, which will meet most requirements but there may be cases where the performance of DB2 or IDS leaves something to be desired and here you would have to use the ODBC adaptor to connect to a specialised analytics product from another vendor.

It seems clear to me that System S will become the (very) high-end platform of choice for all relevant applications. It is also well positioned to be the preferred solution for unstructured event or stream-based applications even where the requirements are not so onerous, precisely because it has capabilities in this area that other products lack. For structured data, that argument is not quite so clear. Here SQL-based approaches can work well and companies may prefer that approach. In any case, there are competing products that can offer at least equivalent performance, probably at significantly lower cost. In general, however, I am extremely impressed with System S and I expect IBM to become a dominant player in this space.