IBM InfoSphere Streams is a high performance, low latency platform for analysing and scoring data in real-time. Environments where InfoSphere Streams might be deployed range from relatively small implementations on a single laptop to multi-node implementations scaling to hundreds of thousands or millions of transactions per second. Typical use cases involve looking for patterns of activity (such as fraud), or exceptions to expected patterns (data breaches) or to find meaningful information out of what otherwise might be considered noise (six sigma), as well as commercial applications such as analysing how customers are using their cell phones (in conjunction with IBM’s recent acquisition The Now Factory). In other words, InfoSphere Streams is essentially a query platform.
In addition to working in conjunction with The Now Factory, InfoSphere Streams also integrates with other IBM products including SPSS (for building predictive models that you can score against in real-time), QRadar (for security information and event management: SIEM) along with BigInsights, external visualisation tools (including Watson Explorer) and data integration environments.
In addition to the main InfoSphere Streams product (currently in version 3.2.1) IBM also offers a Quick Start Edition that is available for free download. This is a non-production version but is unlimited in terms of duration.
IBM has two strategies with respect to InfoSphere Streams. In the first place it wants to build a community of users, which is why it introduced the Quick Start Edition during 2013. Secondly, it wants to build ecosystems of applications, and partners building those applications. In this case, it is focusing on the telecommunications sector in the first instance, but expects to expand into other vertical markets as time progresses.
While IBM already has a number of partners for InfoSphere Streams, few of these will be known to readers. The most notable exception is with respect to IBM’s partnership with Datawatch. The latter is not a development partner but instead provides integration capabilities to external sources of data such as message queues—of course, IBM supports its own WebSphere MQ—but Datawatch provides the ability to access data from a variety of third party sources.
InfoSphere Streams has a diverse range of users. Early adopters of the technology included hospitals (neo-natal units), wind farms, and oil companies predicting the movement of ice floes, as well as a number of scientific deployments. More recently IBM has identified a number of repeatable and more commercially oriented use cases that it is now focusing on. In the short term, the company is focusing on the retail sector, particularly around data breaches, and the financial sector for fraud prevention and detection as well as risk analytics. Telecommunications is also a focus area but there are many others where InfoSphere Streams might be applicable, such as preventative maintenance and other applications deriving from the Internet of Things.
InfoSphere Streams is both a development and runtime environment for relevant analytics. In the case of the latter the product will run on a single server or across multiple, clustered servers depending on the scale of the environments and ingestion rates for real-time processing.
As far as development is concerned, when the product was originally launched it used a language called SPADE (stream processing application declarative engine) but it now supports SPL (stream processing language), which is SQLesque (indeed, the product supports IBM’s Big SQL). There is a conversion facility from SPADE to SPL. However, for most practical purposes all of this is under the covers as the product includes an Eclipse-based drag-and-drop graphical editor for building queries that business developers, in particular, will generally work with. Using this you drag and drop operators while the software automatically syncs the graphical view you are creating with the underlying (SPL) source code. Debugging capabilities are provided for those that want to work directly with SPL.
As an alternative you can create predictive models using SPSS Modeler and import these into the Streams environment via PMML (predictive modelling mark-up language) or using the native SPSS Modeler models and scoring libraries. The environment also supports both Java and R, the statistical programming language, and text analytics via natural language processing (which is good for sentiment analysis, intent to buy analyses and so forth). Finally, there is support for both geospatial and time-series capabilities with the former supporting location-based services and the latter providing a variety of analytic and other functions (including regressions) that are particularly relevant where data is time-stamped, which is especially relevant to the Internet of Things.
For data input, InfoSphere Streams supports MQTT (Message Queue Telemetry Transport), which is a lightweight messaging protocol that runs on top of TCP/IP, as well as WebSphere MQ and the open source Apache ActiveMQ. Other messaging protocols and feeds are supported through a partnership with Datawatch and there is also a RESTful API. There is also support for accessing data from back-end data sources such as the various IBM PureData products as well as third party data warehouses like HP Vertica.
For presentation purposes the product comes with a number of pre-defined graphical techniques that can be used to visualise information and these can be dynamically added at runtime, as required. In addition, you can use both IBM and third party data virtualisation products such as, in the case of IBM, Watson Explorer. There is also a facility to visually monitor applications while they are running.
In addition to the normal sorts of training and support services you would expect from any vendor (including extensive online resources), IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.