IBM
Last Updated:
Analyst Coverage: Philip Howard, Fran Howarth, David Norfolk and Daniel Howard
IBM is a multinational technology and consulting corporation, with headquarters in New York but a global presence. IBM manufactures and markets computer hardware and software, and offers infrastructure, hosting and consulting services in areas ranging from mainframe computers to nanotechnology. The company was founded in 1911 and over the years has grown through acquisition into its present state.
IBM initially made its profits largely from selling hardware, but these days it is more of a software solutions and services company. It currently stresses the delivery, from smarter computing, technology and process, of better business outcomes for the smarter planet. IBM has initiated various emerging open standards and open environment initiatives (such as Eclipse, Jazz, OSLC, RELM and so on) and owns many significant brands such as DB2 Universal Database, WebSphere, and System z.
IBM Doors
Last Updated: 28th May 2013
Doors is an industrial-strength (expensive) requirements management tool which IBM acquired when it bought Telelogic. The original version of Doors focused more on storing and managing requirements than on visualising them, modelling them and so on; it was particularly suited to formal Systems Engineering and to building systems which needed formal compliance to standards etc - it had strong reporting capabilities. Nevertheless, older versions of Doors had limited (view only) web access functionality and poor integration with Microsoft Office.
However, Doors now ships with Doors Next Generation (Doors NG), which addresses many of the perceived limitations of Doors without compromising its strengths and Doors Web Access addresses that issue. Doors NG works with traditional Doors (using OSLC specifications) and comes with Doors. It adds collaboration capabilities for multi-disciplinary product development teams and a lighter-weight Requirements process (suitable, perhaps, for teams migrating from Requirements stored in documents and spreadsheets to something more effective). It provides full web-based access together with an optional rich client on the Microsoft platform. Doors NG is probably a sign of where Doors is going, towards a tool with full configuration management built in, so that different teams can work on different versions of, essentially, the same requirements at once; and local or market-specific versions of common requirements can be maintained.
Doors is bought through normal IBM channels. Doors NG is supplied with Doors, if you buy the latest Doors release. A trial version is available.
Doors is targeted particularly at (but not limited to) Systems Engineering professionals, working on high-value, regulated or safety-critical systems; defense, aerospace, health, transport and so on
Doors is a client/server application with its own database and its back-end runs on a range of Windows. UNIX and Linux servers.
IBM offers all the services, available across the globe, that one might expect to need in support of this product.
Particularly noteworthy is the OSLC community. OSLC or Open Services for Lifecycle Collaboration, is a set of interface specifications which allows Doors to produce/consume services with, say, Rational Team Concert (an Agile application lifecycle management (ALM) solution) or, potentially, other tools supporting OSLC.
Doors is a market leader in the formal requirements management space, so (allowing for the fact that not every developer is skilled ior experienced with using requirements management), it should not be hard to find Doors expertise. IBM supplies Door consultancy and training courses, both in the classroom and online (video); and a range of external trainers and consultants are also available to facilitate deployment of Doors in the organisation (although always remember that institutionalising reequirements management is as much a cultural/people issue as a technology one).
IBM Informix Warehouse Accelerator
Last Updated: 6th May 2013
IBM offers various editions of Informix that are based on the Informix Warehouse Accelerator. This is an extension to the normal database used for transactional purposes. Typically, the Warehouse Accelerator will be implemented on the same system as the relevant transactional environment with analytic data processed in its own memory space so that there is on conflict with operational aspects of the environment - transactional performance should not be impacted.
A major feature of Informix is that it natively supports time series, which both minimises space when you need to store data with time stamps and improves performance. We should state that we know of no other transactional database that has this support. Also pertinent is that Informix has native [{page:Location:geospatial capabilities}].
Informix has been focused on the VAR and ISV market by IBM and the Warehouse Accelerator will allow partners to extend their existing applications built on top of Informix and to introduce new operational analytic capabilities for their clients. In terms of direct sales, however, IBM is targeting smart metering (and therefore utilities and energy companies) with the Informix Warehouse Editions. This application has both operational and analytic requirements. In the former case, for example, you need to recognise outages and handle them in a timely manner, while in the latter case you need analytics for planning purposes (for example, at what times of day do we need to ramp up power production), marketing (what incentives can we provide for more efficient use of resources) and fraud detection, to quote just some examples.
The company targets telecommunications, public sector and retail as sectors in addition to smart metering. It is also examining which markets might be most suitable to use Informix's geospatial capabilities both as a stand-alone feature and in conjunction with time stamps.
While Informix has customers numbered in the tens of thousands and there are many Informix customer references on the IBM web site (the most recent being Cisco) these are predominantly for Informix per se rather than the Warehouse Accelerator and those that we do know of in this category do not want their names publicised.
The Informix Warehouse Accelerator enables dynamic query processing in-memory; together with parallel vector processing and advanced compression techniques; along with a column-based approach to avoid any requirement for indexes, temporary space, summary tables or partitions. In other words it is entirely suitable for supporting analytic applications because the lack of the features mentioned means that administration is both minimised and consistent across transactional and analytic environments.
Typically, the Warehouse Accelerator will be implemented on the same system as the relevant transactional environment. When this is the case you use Smart Analytics Studio, which is a graphical development tool, to define the data (and its schema) that you want to query and the Warehouse Accelerator will automatically offload this data, which is now stored separately from the OLTP environment. It is processed in its own memory space so that there is no conflict with the operational aspects of the environment and transactional performance will not be impacted. Note that there is no need to change your existing business intelligence tool(s).
The optimiser has been specifically designed to support both transactional and analytic workloads when a hybrid environment is being deployed. The optimiser knows what data is in the data mart and what is not: it will determine whether the query can be satisfied by the Accelerator and, if so, it routes the query there. If not, it will choose to execute the query within Informix. Now, if a query saves the result into a temporary table as part of the Select statement, as is often done by certain BI tools, then the Accelerator can speed up that portion of the query.
The smart metering market, at which the Informix Warehouse Editions are mainly aimed, is still in its very early stages. However, whether it is for smart metering or not the ability that Informix offers, to perform serious analytics on the same platform as transactional functions, is a strong one. The way that the system has been designed is not simply as one system for two different functions - which we do not regard as ideal - but, effectively, as two different systems on the same box and with the same administration; both linked by some clever software and purpose-built optimiser functions. Thus from a theoretical point of view, Informix has every chance of success within its target market. The danger is that while IBM has recently ramped up its development and marketing of Informix, this may not last if returns are not seen in the short to medium term. Informix is very much IBM's third database, after DB2 and Netezza, and it is important that momentum is maintained.
In addition to the normal sorts of training and support services you would expect from any vendor, IBM offers business services (application innovation, business analytics, business strategy, functional expertise, midmarket expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
IBM InfoSphere Discovery
Last Updated: 8th May 2013
IBM InfoSphere Discovery was previously known as Exeros before the company known with the same name was acquired by IBM. It is a data profiling and discovery tool. While it has always been technically a part of the InfoSphere portfolio it was originally marketed along with the company's Optim solution to support data archival, masking and similar functions. However, there is another InfoSphere product that also offers data profiling, known as Information Analyzer. In our view Discovery is much the superior product (because it offers extended discovery capabilities) for processes such as data migration and in supporting master data management (MDM) initiatives. IBM needs to make clear that Information Analyzer is really only suitable in data quality environments where it is simply a question of profiling individual data sources and there is no requirement for cross-source analysis.
Major elements in the Discovery platform are the Discovery Engine and Discovery Studio. The former is the component (or components - you may have multiple engines for scalability purposes) that does the actual process of discovering business rule transformations, data relationships, data inconsistencies and errors, and so on. Where appropriate it generates cross-reference tables that are used within the staging database, it creates metadata reports either in HTML format or Excel, and it generates appropriate SQL, XML (for use with Exeros XML) and ETL scripts (for use in data migration and similar projects). Discovery Studio, on the other hand, is the graphical user interface employed by data analysts or stewards to view the information (both data and metadata - Discovery works at both levels) discovered by the engine; and to edit, test and approve (via guided analysis capabilities) relationships and mappings from a business perspective.
InfoSphere Discovery is an enabling tool rather than a solution in its own right so it is horizontally applicable across all sectors. It will be particularly useful where it is necessary to understand business entities (for example, a customer with his orders, delivery addresses, service history and so on) and process those business entities as a whole. Notable environments that require such an approach include application and database-centric data migrations, master data management and archival.
We have no doubt that IBM has many successful users of InfoSphere Discovery. However, you wouldn't know that to judge by its web site, which includes just two case studies where customers are using the product - CSX and FiServ - but in neither paper is the use of InfoSphere Discovery discussed; the product is simply listed as one of the IBM products in use.
In addition to providing conventional data profiling capabilities (finding and monitoring data quality issues) Discovery supports the discovery of orphaned rows, scalar relationships (simple mappings, substrings, concatenations and the like), arithmetic relationships between columns, relationships based on inner and outer joins, and correlations for which cross-reference tables are generated. Cross-source data analysis is available both to discover attribute supersets and subsets, and to identify overlapping and unique attributes. In the latter case there is a visual comparison capability that allows you to compare record values from two different sources on a side-by-side basis. In addition there are automatically generated source rationalisation reports that compare data sources to one another. Further features include support for filtering, aggregations and if-then-else logic, amongst others.
There is also a Unified Schema Builder designed specifically to support new master data management, data warehousing and similar implementations that includes precedence discovery, and empty target modelling and prototyping. There are also facilities for cross-source data preview, automated discovery of matching keys (that is, a cross-source key for joining data across sources), automated discovery of business rules and transformations across two or more data sets with statistical validation, and automated discovery of exceptions to the discovered business rules and transformations.
This is what we wrote in our 2012 Market Report on Data Profiling and Discovery: "since our last report into this market IBM has acquired Exeros, which was market leading for discovery purposes at that time. It should therefore come as no surprise that IBM offers the best understanding of relationships of any product that we have examined. For pure profiling capabilities IBM InfoSphere Discovery is good without being outstanding but it is clearly one of the market leaders when discovery is required alongside profiling." That view remains unchanged: in support of MDM, migration, archival and similar environments Discovery is clearly the leading product in the market.
In addition to the normal sorts of training and support services you would expect from any vendor, IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
IBM InfoSphere Optim Test Data Management
Last Updated: 20th June 2014
Test Data Management (TDM) is about the provisioning of data for non-production environments, especially for test purposes but also for development, training, quality assurance, demonstrations or other activities. Historically, the predominant technique for provisioning test data has been copying some production data or cloning entire production databases. Copying 'some' of the data has issues with ensuring that the data copied is representative of the database as a whole, while cloning entire databases is either expensive (if each team has its own copy) or results in contention (if different teams share the same copies). The other major problem with these approaches is that because of cost/contention issues the environments are not agile and you need agile test data to complement agile development.
IBM InfoSphere Optim Test Data Management resolves these issues, and supports a DevOps-based approach, by allowing you to create referentially intact subsets of the database that accurately reflect the structure of the original data source. This is important because you want test data to be representative of the live environment, as otherwise important test cases can be missed. In the case of IBM InfoSphere Test Data Management, subsets may be of different sizes for different testing purposes: for example, you might want a larger test database for stress testing than for some other types of tests.
There are, in fact, two versions of IBM InfoSphere Optim Test Management: one for z/OS and one for other systems. The latter is currently in version 9.3
IBM InfoSphere Optim Test Data Management is horizontally applicable across all sectors and is marketed directly by IBM. It is particularly useful in [{page:DevOps:DevOps}] environments where development and operations are collaborating with one another and it also especially complements agile development methodologies.
IBM also has partners marketing specific solutions based on IBM InfoSphere Optim Test Data Management. For example, TouchWorld offers a solution for test data for SAP applications.
IBM’s InfoSphere Optim Test Data Management Solution has a global client base with over a thousand clients worldwide across a variety of industries. DevOps and the need for Continuous Testing are currently driving more demand for an effective test data management solution. [{page:Data Masking:Data masking}] for non-production environments is another requirement creating demand. Customers using IBM Optim Test Data Management include Nationwide Insurance, Conway Inc., GEICO, CSX Corporation, Allianz Seguros, Cetelem (part of BNP Paribas), Dignity Health and HM Land Registry.
IBM's InfoSphere Test Data Management solution provides capabilities to: discover and understand test data, discover sensitive data, subset data from production, mask test data, refresh test data and analyse test data results. This integrates with the IBM Rational Test Workbench as well as other leading testing suites such as HP Quality Center. You can also use the IBM Rational Service Virtualisation Product (previously Greenhat) and IBM’s Rational Urbancode Deploy in conjunction with this offering. This service virtualisation is important when (some) test data is derived from external sources or is otherwise not easily available.
Notable features of IBM InfoSphere Optim Test Data Management include a range of data masking options, very advanced discovery capabilities, a data comparison capability, data sub-setting based on an understanding of data relationships (to ensure that test data is representative) and support for multi-sized data subsets. InfoSphere Optim Test Data Management can generate a completely anonymous set of data, but this requires a specification of the original data set, which can be a barrier for some clients. Note that this differs from synthetic generation that is generated from a profile of the data, as opposed to being generated from a base dataset. It is likely that IBM will introduce synthetic generation at some point in the future.
There are actually three phases to deployment of IBM InfoSphere Optim Test Data Management: move, edit and compare. In the move phase you extract, copy and move the data (which may come from multiple sources) not necessarily just for creating test data but also to support data migration and data ageing. In the edit phase you can view, and edit if necessary, the extracted data, which can be in any arbitrarily complex schema. In the compare phase you can compare data from one set of source tables with another either online in the Optim GUI or in reports. Compare enables automated comparison of two sets of data, then enables users to do things like compare data after a test run with the baseline 'before version' or track database changes or compare different data sources. The three phases, deployed together, enable organisations to easily conduct iterative testing, for faster and more comprehensive testing.
The product runs on Windows, Linux and UNIX platforms and supports not just IBM’s own database products (DB2, Informix et al) but also leading third party database products such as Oracle, SQL Server and Teradata.
In addition to the normal sorts of training and support services you would expect from any vendor (which includes extensive online resources), IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process out-sourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
IBM InfoSphere Streams
Last Updated: 30th May 2014
IBM InfoSphere Streams is a high performance, low latency platform for [{page:Analytics:analysing}] and scoring data in real-time. Environments where InfoSphere Streams might be deployed range from relatively small implementations on a single laptop to multi-node implementations scaling to hundreds of thousands or millions of transactions per second. Typical use cases involve looking for patterns of activity (such as fraud), or exceptions to expected patterns (data breaches) or to find meaningful information out of what otherwise might be considered noise (six sigma), as well as commercial applications such as analysing how customers are using their cell phones (in conjunction with IBM’s recent acquisition The Now Factory). In other words, InfoSphere Streams is essentially a query platform.
In addition to working in conjunction with The Now Factory, InfoSphere Streams also integrates with other IBM products including SPSS (for building predictive models that you can score against in real-time), QRadar (for security information and event management: SIEM) along with BigInsights, external visualisation tools (including Watson Explorer) and data integration environments.
In addition to the main InfoSphere Streams product (currently in version 3.2.1) IBM also offers a Quick Start Edition that is available for free download. This is a non-production version but is unlimited in terms of duration.
IBM has two strategies with respect to InfoSphere Streams. In the first place it wants to build a community of users, which is why it introduced the Quick Start Edition during 2013. Secondly, it wants to build ecosystems of applications, and partners building those applications. In this case, it is focusing on the telecommunications sector in the first instance, but expects to expand into other vertical markets as time progresses.
While IBM already has a number of partners for InfoSphere Streams, few of these will be known to readers. The most notable exception is with respect to IBM’s partnership with Datawatch. The latter is not a development partner but instead provides integration capabilities to external sources of data such as message queues—of course, IBM supports its own WebSphere MQ—but Datawatch provides the ability to access data from a variety of third party sources.
InfoSphere Streams has a diverse range of users. Early adopters of the technology included hospitals (neo-natal units), wind farms, and oil companies predicting the movement of ice floes, as well as a number of scientific deployments. More recently IBM has identified a number of repeatable and more commercially oriented use cases that it is now focusing on. In the short term, the company is focusing on the retail sector, particularly around data breaches, and the financial sector for fraud prevention and detection as well as risk analytics. Telecommunications is also a focus area but there are many others where InfoSphere Streams might be applicable, such as preventative maintenance and other applications deriving from the Internet of Things.
InfoSphere Streams is both a [{page:Development:development}] and runtime environment for relevant [{page:Analytics:analytics}]. In the case of the latter the product will run on a single server or across multiple, clustered servers depending on the scale of the environments and ingestion rates for real-time processing.
As far as development is concerned, when the product was originally launched it used a language called SPADE (stream processing application declarative engine) but it now supports SPL (stream processing language), which is SQLesque (indeed, the product supports IBM’s Big SQL). There is a conversion facility from SPADE to SPL. However, for most practical purposes all of this is under the covers as the product includes an Eclipse-based drag-and-drop graphical editor for building queries that business developers, in particular, will generally work with. Using this you drag and drop operators while the software automatically syncs the graphical view you are creating with the underlying (SPL) source code. Debugging capabilities are provided for those that want to work directly with SPL.
As an alternative you can create predictive models using SPSS Modeler and import these into the Streams environment via PMML (predictive modelling mark-up language) or using the native SPSS Modeler models and scoring libraries. The environment also supports both Java and R, the statistical programming language, and text analytics via natural language processing (which is good for sentiment analysis, intent to buy analyses and so forth). Finally, there is support for both geospatial and time-series capabilities with the former supporting location-based services and the latter providing a variety of analytic and other functions (including regressions) that are particularly relevant where data is time-stamped, which is especially relevant to the Internet of Things.
For data input, InfoSphere Streams supports MQTT (Message Queue Telemetry Transport), which is a lightweight messaging protocol that runs on top of TCP/IP, as well as WebSphere MQ and the open source Apache ActiveMQ. Other messaging protocols and feeds are supported through a partnership with Datawatch and there is also a RESTful API. There is also support for accessing data from back-end data sources such as the various IBM PureData products as well as third party data warehouses like HP Vertica.
For presentation purposes the product comes with a number of pre-defined graphical techniques that can be used to visualise information and these can be dynamically added at runtime, as required. In addition, you can use both IBM and third party data virtualisation products such as, in the case of IBM, Watson Explorer. There is also a facility to visually monitor applications while they are running.
In addition to the normal sorts of training and support services you would expect from any vendor (including extensive online resources), IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
IBM PureData System for Analytics
Last Updated: 15th July 2014
The IBM PureData System for Analytics is a relational data warehouse appliance. It is the successor to the Netezza appliance acquired by IBM in 2010. The tag line reads "powered by Netezza technology". It is a massively parallel processing (MPP) database preinstalled and configured so it works with little or no on-site configuration. This was the differentiator Netezza established very successfully and which prompted the subsequent popularity of appliances or appliance-like products, not just for data warehousing but for other database applications.
IBM delivers the PureData System for Analytics through its global direct sales force. As well as plentiful reference customers in Telecommunications, Finance & Banking, Retail and Marketing Analytics, there does not seem to be any industry that is not penetrated to at least some extent.
Users span a wide set of analytics and reporting use cases, with particular growth in in-database analytics since the introduction of a wide range of analytics capabilities; for example support for R and a partnership with Revolution Analytics for their strong R-based algorithms.
Customers who adopt a 'logical data warehouse'—an extended data management architecture, comprising data in multiple formats and originating from multiple sources—sometimes describe the role of the PureData System for Analytics as the “relational data lake” in that architecture. Others are using it in more traditional warehousing roles; as data marts, multi-subject-area warehouses, central warehouses and any combination of these.
The major industries for the PureData System for Analytics appear to be Telecommunications, Finance & Banking and Retail, although there are also significant customer references in Marketing Analytics and most industries appear to be penetrated to at least some extent.
IBM is experiencing strong growth within existing large corporate accounts, as data volumes expand and more use cases are implemented. There is also strong growth in new accounts.
This suggests that the basic proposition—faster complex, analytic queries—has resonance in a wide range of situations. While scalability (see Technology) allows support of very large databases, the low entry point (1/4 rack) means that the PureData System for Analytics solution is also available to SMBs. Our research indicates that the 1/4 rack machine was specifically added to appeal in European markets where data volumes are typically lower and there is a greater tendency to start small and incrementally build.
The PureData System for Analytics is shipped in units of 1/4 rack, 1/2 rack, one rack, two racks, four racks and eight racks, which provides capacity for over 700Tb of user data on a single appliance.
The shared-nothing MPP architecture has been refined over a number of generations by IBM, based on the Netezza multi-core processors and patented FPGA (Field Programmable Gate Arrays) co-processor architecture. The FPGAs provide a unique way to stream query results from storage, avoiding pre-loading into memory. This has long been a successful technology for Netezza and, subsequently, IBM.
Recent iterations of the system have added additional performance-enhancing refinements such as columnar compression, snippet result caching, and large memory on processors. The PureData System for Analytics now ships with over 200 in-database analytic functions, including geo-spatial functions (provided by partner ESRI).
One further valuable addition is the ability for users to licence, and pay for, only part of the capacity of an appliance they have installed, so they have expandability built-in and can match investment more closely to their needs, eliminating the 'chunky' cost of acquisition. This feature is available from one rack and upwards. The physical 1/2 and 1/4 rack options provide downward scalability.
IBM offers a huge range of services, including application development, analytics, strategy, functional expertise, IT services, outsourcing, hosting and financing.
Since most PureData System for Analytics sales are into organisations with existing, transferable SQL database skills, adoption of the new platform is rarely a major issue.
IBM PureData System for Operational Analytics
Last Updated: 6th May 2013
The PureData System is based on the latest version of the p7 processor that was used in the 7700-based system but storage capacity has been increased with 900GB disks now being standard and solid state capacity has been similarly enhanced. Included within the PureData product are not just the hardware but also the latest version of AIX (7.1), InfoSphere Warehouse and its associated products (graphical and web-based tools for the development and execution of physical data models, data movement flows [SQW], OLAP analysis [Cubing Services] and data mining), DB2 (either v9.7 or v10), WebSphere Application Server, Optim Performance Manager (previously DB2 Performance Manager), Tivoli System Automation for Multi-platforms, and a new system console, as well as various other tools and utilities.
Apart from the focus on particular types of environment, as discussed, IBM does not have any particular vertical focus. Indeed, given the company's size you would expect it to be all-encompassing. However, it does offer specialised data models for a number of specific sectors, notably for banking, financial markets, healthcare, insurance, retail and telecommunications. It also offers a more generic pack for customer insight, market and campaign insight and for supply chain insight.
IBM has been a leading vendor in the (enterprise) data warehouse market since its inception. As such it has thousands of customers, both famous and not so famous. Historically it focused on the high end of the market but today it has offerings that scale down to less than 1TB.
The IBM PureData System for Operational Analytics is available in "T-shirt" sizes: extra small, small, medium, large, extra-large, and so on. At the bottom end the system consists of a "Foundation" module; and then you can add data modules, which each take up 1/3rd of a rack in any number you like up to 6 racks (that is, 18 data modules) - where each data module provides 62.4TB of raw data capacity. As far as failover is concerned, one failover module is required for each three data modules, with the proviso that you must have a failover module in each rack.
Notable features of the PureData System for Operational Analytics include advanced compression, piggy-back scans and multi-dimensional clustering. With v10.1 of DB2 you also get zigzag joins, continuous ingest and time-travel queries. Once DB2 v10.5 (released April 2013) is available as a part of this system (which we expect in due course), this will provide BLU Acceleration, which includes columnar storage, dynamic in-memory caching, parallel vector processing, data skipping and even more advanced compression. All of these features are designed to significantly improve performance.
Over the last decade or so there has been a significant shake-up in the data warehousing space. However, a clear pattern has now emerged. There are:
- Merchant database vendors that believe in a one-size-fits-all approach to both transaction processing and warehousing.
- IBM: a merchant database vendor that believes that you cannot get the best performance characteristics from a system that is supposed to cater to both transaction processing and analytics and which has therefore created specialised bundles for specific purposes, such as the PureData System for Operational Analytics.
- Traditional specialist data warehousing vendors that will compete with the PureData System for Operational Analytics.
- A host of newer (and some not so new) vendors that are really offering data marts rather than something that is suitable for use as an enterprise data warehouse.
In other words, the IBM PureData Systems for Operational Analytics faces exactly the same competitors as IBM data warehousing has faced historically. IBM has been successful over the last twenty years in acquiring a significant slice of the data warehousing market and we so no reason why that should change now.
In addition to the normal sorts of training and support services you would expect from any vendor, IBM offers business services (application innovation, business analytics, business strategy, functional expertise, midmarket expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
IBM Streams and Streaming Analytics
Last Updated: 6th December 2018
Mutable Award: Gold 2018
IBM Streams (previously IBM InfoSphere Streams) is a high performance, low latency platform for analysing and scoring data in real-time. It is a part of the Watson & Cloud Platforms. In addition to the main Streams product (currently in version 4.2.x) IBM also offers a Quick Start Edition that is available for free download. This is a non-production version but is unlimited in terms of duration. There is also a cloud-based offering, called IBM Streaming Analytics, that runs on IBM Cloud; there is an optional “lite” version that allows up to 50 free hours usage per month.
IBM Streams is both a development and runtime environment for relevant analytics. In the case of the latter the product will run on a single server or across multiple, clustered servers depending on the scale of the environments and ingestion rates required for real-time processing. There is also a Java-based version developed to run on edge devices, which has been open-sourced as Apache Edgent. This requires a Java Virtual Machine (JVM) but is otherwise very lightweight. It supports Kafka and MQTT (Message Queue Telemetry Transport) as does Streams, and you can push down analytic functions from Streams into Edgent.
Customer Quotes
"Once we had settled on IBM Streams, we were able to plug in the statistical models developed by our data scientists and embark on a rapid proof of concept, which went very well. From there, we were able to industrialize the solution in just a few months."
Cerner Corporation
"IBM Streams increases accuracy of Hypoglycemic event prediction to ~ 90% accuracy with a three-hour lead time over base rate of 80%."
Medtronic
"With our partner, IBM, we are leveraging the power of the unstructured and structured data through streaming and cognitive capabilities to position ourselves effectively to meet the needs of our customers."
Verizon
Typical use cases for IBM Streams involve looking for patterns of activity (such as fraud), or exceptions to expected patterns (data breaches) or to find meaningful information out of what otherwise might be considered noise (six sigma), as well as commercial applications such as analysing how customers are using their cell phones, or to support Internet of Things (IoT) applications such as predictive maintenance.
As stated, the product is both a development and deployment platform. The latter has been discussed. As far as the former is concerned the product primarily supports SPL (stream processing language), which is a SQL-like declarative language. However, for most practical purposes this is under the covers as the product includes an Eclipse-based drag-and-drop graphical editor (Streams Studio) for building queries. Using this you drag and drop operators (which include functions such as record-by-record processing, sliding and tumbling windows, and so on) while the software automatically syncs the graphical view you are creating with the underlying (SPL) source code. Debugging capabilities are provided for those that want to develop directly with SPL. In addition to SPL, Streams Studio also supports development in Java, Python and Scala (via a Java API). SPL will typically outperform Python (for example) as SPL is written and compiled in C whereas much of Python (for example) is interpreted.
Currently in beta, an alternative called Streams Designer offers a web-based environment, which is reputedly easier to use. While the current Streams Studio is usable by business analysts we expect Streams Designer (Figure 1 illustrates flow editing in Streams Designer) to be more popular amongst this constituency.
Figure 2 illustrates some of the functions of IBM Streams as well as the connectivity options that are available. There are, however, notable capabilities omitted from the figure. In terms of functions these include integration with IBM’s rules engine and the ability to do deep packet inspection. There is also no mention of the Db2 Event Store, which can be used to persist events. Figure 2 also fails to cover support for PMML (predictive modelling mark-up language) for model scoring portability. It is also worth mentioning integration with Apache Beam (via an API), which is a software development kit (SDK) for constructing streaming pipelines. This would be as alternative to using Streams Designer. Finally, but by no means least, IBM Streams is delivered with some twenty pre-built machine learning algorithms. These are typically packaged into toolkits for specific verticals, such as cybersecurity.
The ability to ingest and analyse data in real-time is fundamental to many existing and developing environments. The most commonly cited are fraud applications on the one hand and Internet of Things based applications on the other. However, while IBM Streams is clearly one of the market leaders when it comes to both performance and analytics capability for such conventional capabilities, it has also been extended into areas that other vendors cannot reach. As one example, Streams leverages IBM Watson’s speech to text capabilities (for call centres, for example); as another, IBM is making significant contributions in the medical arena, and not just with respect to Medtronic example quoted. It is also worth noting the internationalisation of Streams, which is available both in single byte and double byte languages.
The Bottom Line
IBM Streams was not the earliest product to be introduced into this market but it is almost a decade old. While modernisation is always an ongoing requirement, the enterprise-class features you require come from the sort of maturity that IBM has in spades.
InfoSphere Guardium
Last Updated: 29th October 2014
IBM’s Guardium products are part of its InfoSphere platform, designed for trusted information purposes. The platform includes data integration, data warehousing, master data management, big data and information governance.
Within InfoSphere, the Guardium products are part of IBM’s overall information integration and governance strategy and are designed to help organisations with their data security and privacy needs. Within the Guardium products are centralised controls for data privacy and security, database activity monitoring, including data masking, data encryption and redaction, and vulnerability assessment. Guardium was originally acquired by IBM in 2009.
The Guardium products are designed to address the data security and compliance lifecycle. Core capabilities include protecting both structured and unstructured information in databases, data warehouses and file shares, to capture, analyse and monitor data traffic, enforce access controls, monitor and enforce policies, find and classify sensitive data, find and analyse vulnerabilities, and support forensic investigations. It offers a central management console and full audit capabilities for governance and compliance purposes, along with a dashboard for tracking and resolving data security problems.
Although IBM states that its InfoSphere Guardium products can help organisations streamline operations regardless of their size, these products are generally used by large enterprises, often multinationals with worldwide operations. It claims this product family scales to support tens of thousands of databases in an organisation, including those on-premise, in the cloud or in Hadoop environments for big data.
The Guardium family of products can be purchased in a modular fashion, allowing organisations to choose individual products from within the portfolio, as well as to mix them with components from any other vendor.
These products are aimed at large enterprises that face pressing security and regulatory compliance concerns. In a video on its website, IBM states that its InfoSphere Guardium products are used by some 600 organisations worldwide, including the majority of the top banks, managed healthcare providers, retailers, insurers and telecommunications companies globally.
Designed to protect trusted information in databases, InfoSphere Guardium supports all major database management platforms and protocols on all the major operating systems, as well as a growing range of file and document sharing environments. It also supports a number of enterprise business suites, including those from Oracle and SAP.
InfoSphere Guardium is based on a multi-tier architecture, using connector appliances to gather data from systems under management, which are then collected in a central management console where the data is aggregated and normalised, providing an audit trail and secure management and enforcement of policies. No changes are required to the configurations of systems under management and all functions are managed in one central repository. For creating greater security intelligence, Guardium interfaces with a number of other systems, including LDAP directories for access control, email, change ticketing and SIEM systems.
In addition to the normal sorts of training and support services you would expect from any vendor, IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
InfoSphere Information Server
Last Updated: 27th November 2014
InfoSphere Information Server is IBM’s data integration platform. It supports the whole spectrum of use cases for data integration including application to application integration, B2B integration, ETL (extract, transform and load) and ELT for moving data from one environment to another (from operational systems to a data warehouse is the classic case, nowadays including moving data into Hadoop), data migration and SaaS (software as a service) implementations. In this latter case IBM offers specific capabilities with regard to some third party applications, notably Salesforce.com.
There are also specific editions of InfoSphere Information Server, including; InfoSphere Information Server Enterprise Edition, InfoSphere Information Server Enterprise Hypervisor Edition, InfoSphere Information Server for Data Integration, InfoSphere Information Governance Catalog, and InfoSphere Information Server for Data Quality (which provides data profiling and data cleansing capabilities). There are also editions for data warehousing and for SAP implementations. We are slightly surprised that there is no particular edition for data migration and/or archiving, where you would probably require IBM InfoSphere Discovery as opposed to IBM Information Analyzer, which you get in the Information Server for Data Quality Edition. However, we understand that InfoSphere Discovery and Information Analyzer will be merged during the course of 2015, which will make this unnecessary.
IBM delivers InfoSphere Information Server through its global direct sales force.
As a product that is around 20 years old (InfoSphere DataStage, as it was then, was originally developed by Ardent, which was acquired by Ascential, which in turn was acquired by IBM) there are thousands of users of InfoSphere Information Server and there are a number of case studies published by IBM on its web site, including the company’s own implementation in the office of the CIO, as well as banking customers, universities and others.
Perhaps the most significant aspect of InfoSphere Information Server, outside its role of physically moving and transforming data at scale (including big data integration and grid deployments), is the extent to which it plays a part in information governance. Thus the product includes the ability to define governance policies and rules, a data governance dashboard, integration with InfoSphere Master Data Management, Blueprint Director (used to model and view the integration and governance architecture) and the Information Governance Catalog (previously Business Information Exchange), which provides a business glossary and metadata workbench. There is also a new lineage viewer.
Alongside core capabilities (for example, new connectors to the cloud) the emphasis in the data integration market at present is very much on self-service capabilities, just as the same is true for business intelligence. InfoSphere Information Server provides this with InfoSphere Data Click, which is a simple web-based interface for on-demand integration, both between conventional sources but also for access data lakes (integration with Hadoop is built into the product). However, self-service is not only about the user experience, it is also about automating functions that underpin what the user can do. Thus, for example, Information Server supports the idea that you can move data into, say, an IBM data warehouse using ETL at one time of the day and ELT at another time of the day. However, this has to be manually scheduled and it would be preferable if there was an optimiser built into the product to automate this and other processes. We understand that such an optimiser is planned. We should also add that this is not a criticism—other vendors in this space do not, typically, have even the existing level of automation that IBM provides. Other self-service capabilities include smart hover (which provides contextual details about an asset when you hover your mouse over it) and semantic search capabilities. There are also new facilities in the latest version (11.3) to support cross-team collaboration.
In addition to the normal sorts of training and support services you would expect from any vendor (which includes extensive online resources), IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process out-sourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
Rational Requirements Composer
Last Updated: 27th May 2013
IBM Rational Requirements Composer is a 'next generation' requirements management tool - it is part of IBM's Collaborative Life-cycle Management tool-set on the Jazz platform. It claims to use 'just enough' process, in conjunction with a web-based application, to let you manage requirements effectively for iterative, waterfall and agile-at-scale development methodologies (always remembering that true Waterfall was always intended to be an iterative process, of course). It emphasises agile delivery of business outcomes through collaborative development by global teams: enabling all stakeholders (including the customer sponsor, users, marketing, legal and compliance, finance, education, operations and developers) to achieve rapid consensus on requirements and their priority - which could contribute to a [{page:DevOps:DevOps}] culture.
Requirements Composer supports requirements capture; traceability; collaborative review; visual requirements definition; progress and status reporting; and an audit history of changes to requirements. It can be seen as complementing IBM's other requirements management tools by adding collaboration, capture and visual modelling capabilities on top of their requirements management capabilities.
Rational Requirements Composer is available, and supported, globally. Licenses can be bought through normal IBM channels. However it can also be downloaded from jazz.net and potential users can try it out in a web-based 'sandbox' environment, from a web browser, Eclipse client, Visual Studio client etc.; and potential collaborators can be invited to join in the trial.
Requirements Composer is aimed at Agile developers in the enterprise, especially those aiming for 'agile at scale'. However, it is available to a much wider community than this via jazz.net on the web.
Rational Requirements Composer is a web application running on a wide range of desktops: Windows or various flavours of Linux. Its server runs on various flavours of Linux, on Windows Server, Sun Solaris, AIX IBM Power System and VMWare. It uses a wide range of databases to hold requirements data and metadata; Apache Derby is included but D2, Oracle and SQL Server are also supported. Apache Tomcat application server is included, but WebSphere is also supported
It supports the Open Services for Life-cycle Collaboration (OSLC) specifications, so it can integrate with a range of development tools from IBM (and, potentially, other vendors).
What this all means is that Rational Requirements Composer isn't dependent on being deployed in all-IBM environments, which might not have been so true of past IBM tools.
IBM offers all the services (consultancy, training etc.), available across the globe, that one might expect to need in support of this product.
Paricularly noteworthy is the Jazz Community at jazz.net, which represents IBM's next generation of interactive, collaborative tools for delivering buisiness outcomer from smarter automated systems. jazz.net is a lot more than a conventional product support site and according to IBM "we use our products to build our products, right here at Jazz.net. You can track our plans, talk to the developers, and try our latest stuff. Help us build the tools that you're dreaming about!". There does, indeed, seem to be a buzz around jazz.net.
Rational Requirements Composer is also part of the OSLC (Open Services for Lifecycle Collaboration) community for tool integration
A wide range of training courses, both classroom and e-Learning, for Rational Requirements Composer are available from IBM and its partners.
Rational Test Virtualization Server
Last Updated: 17th December 2018
Rational Test Virtualization Server (RTVS) is IBM’s service virtualisation offering. It is part of the company’s Rational Test Workbench, a suite of products that are designed to provide a complete continuous testing solution. The suite also includes a variety of products for enabling functional, performance and API testing and, as you might imagine, these products are well integrated. The Rational Test Workbench is in turn part of IBM DevOps, the branch of IBM devoted to providing products that enable continuous delivery and digital transformation.
Customer Quotes
“IBM Rational Test Workbench software lends itself to complex environments. We use it to test across platforms, which significantly cuts the time required to roll out applications.”
Banking company
“The bottom line: service virtualisation reduces your cost and accelerates development, and that’s what companies need to stay ahead of their competition.”
Sandhata
There are multiple ways to create a virtual service in RTVS. The most straightforward is to record and playback requests and responses to and from a real service. Alternatively, you can import existing definition files (for example, Swagger specifications) or create your virtual service from scratch in ECMAScript. More interestingly, RTVS provides you the option to, in conjunction with the aforementioned recording, examine and ‘synchronise’ with the System Under Test (SUT). This allows RTVS to create both physical and logical models of the SUT, mapping out, respectively, the physical and logical dependencies within your system. The latter will be more pertinent the majority of the time, allowing you to see, graphically, the different services within the SUT and how they depend on each other. This is useful for helping both technical and non-technical users understand the SUT. More importantly, RTVS allows you to create any number of virtual services based on this model without writing any code (although they may be supplemented with scripting in ECMAScript or Groovy if desired).
RTVS provides extremely broad support for messaging protocols and message formats, the full list including more than 30 different technologies. It can also virtualise systems in addition to services. This includes products from SAP, TIBCO and, of course, IBM, as well as mainframes. This means that you can test against virtual copies of these systems just as you would a regular virtual service. The product also features sophisticated sifting and passthrough technology, allowing you to, for example, use a virtual service to capture and respond to data that is coming from a particular source, allowing all other requests to pass through to the real service behind it. In addition, RTVS allows you to model the response times of your virtual services on real data, ensuring that they are realistic.
Virtual services created in RTVS can be published to a virtual service repository. This is a store of virtual services, available throughout the enterprise, that is intended to promote the reuse of virtual services by both technical and non-technical users. It is accessed through a browser, allowing all users in your organisation to configure and deploy virtual services as needed. In addition, RTVS supports what IBM calls ‘Incremental Integration Testing’. The principle idea here is that you start out by testing a single, isolated part of your system, virtualising all components of your system that exist outside of this part. Then, gradually, you replace these virtual components with real ones, testing as each service is replaced. This allows you to test gradually and methodically, rather than having to test everything, together, at once. RTVS enhances this methodology by enabling a technologically seamless transition between virtual and real services, owing to their shared interfaces.
Testing is an extremely important part of the software development lifecycle. With software products increasingly developed within an extensive ecosystem of APIs and other services, service virtualisation has become a necessity in order to test your software in isolation of the other components within your system. It is also important for continuous testing, as virtual services, unlike real ones, do not experience downtime. In addition, if you are utilising third party services that charge by usage, using them during testing is an unnecessary expense that can be avoided with service virtualisation.
RTVS stands out for a number of reasons. Incremental integration testing allows you to test parts of your software in isolation before integrating them bit by bit. This allows for a gradual and controlled integration testing process, and is particularly helpful if, for example, you have multiple teams developing co-dependent software. With RTVS, each team can test their software against a virtual copy of the others’, then seamlessly slot in the real software when it is ready. The ability to virtualise systems – particularly mainframes – is another notable feature. A significant number of organisations have legacy systems that cannot be disrupted under any circumstances, and therefore can be difficult to test against. RTVS can virtualise these systems, allowing for effective testing in these environments. Finally, it should be noted that both the visual model of your system and the virtual service repository that RTVS creates are helpful for enabling effective collaboration on virtual services throughout your enterprise.
The Bottom Line
As part of the Rational Test Workbench, RTVS forms a significant part of a complete continuous testing solution. Even apart from it, it is a competent service virtualisation product with several interesting and innovative features. If these features are relevant to you – and there is every reason at least some of them should be – we urge you to consider adding RTVS to your shortlist.
Commentary
Solutions
- IBM BigInsights
- IBM Bluemix
- IBM Cloudant
- IBM DashDB
- IBM DataWorks
- IBM DB2
- IBM Doors
- IBM Informix Warehouse Accelerator
- IBM InfoSphere Discovery
- IBM InfoSphere Master Data Management
- IBM InfoSphere Optim Test Data Management
- IBM InfoSphere Streams
- IBM PureData System for Analytics
- IBM PureData System for Operational Analytics
- IBM Softlayer
- IBM Streams and Streaming Analytics
- IBM Watson Analytics
- IBM Watson Content Analytics
- IBM Watson Engagement Advisor
- IBM Watson Explorer
- IBM Watson User Modeling
- InfoSphere Guardium
- InfoSphere Information Server
- Rational Requirements Composer
- Rational Test Virtualization Server