IBM Db2 Event Store

Update solution on July 1, 2020

Mutable Award: Gold 2020

What is it?

Db2 Event Store is a database that is packaged within IBM Cloud Pak for Data, though it is also available stand-alone. Briefly, IBM Cloud Pak for Data is an integrated data science, data engineering and app building platform built on top of the Red Hat OpenShift Container Platform. The intention is to a) provide all the benefits of cloud computing but inside your firewall and b) provide a stepping-stone, should you want one, to broader (public) cloud deployments. It has a micro-services architecture and provides an environment that will make it easier to implement data-driven processes and operations and, more particularly, to support both the development of AI and machine learning capabilities, and their deployment.

As far as Db2 Event Store is concerned, it is intended to support both near real-time and deep analytics on historic data. And, when combined with IBM Streams, they support the whole gamut of analytics requirements. Typical applications include anti-fraud, smarter manufacturing, and advanced data modelling on real-time data flows, amongst others. Effectively, what Db2 Event Store enables is a solution that replaces a so-called Lambda architecture, which is an approach that involves using multiple databases. This is complex and requires at least two different development environments. An alternative is a Kappa architecture that uses a single persistent store but, typically, Kappa architectures do not support the sort of deep analytics needed to support machine learning and AI. Db2 Event Store, on the other hand, has been designed to provide a single environment that supports all of the requirements outlined, enabled, at least in part, by the common SQL engine.

What does it do?

Db2 Event Store is built on top of the Apache Spark platform (including support for Spark geospatial and time-series functions) and provides an in-memory database that stores data in Parquet format. However, it uses the Db2 optimiser (and also some elements of Db2 BLU technology) rather than the standard Spark optimiser, as the former is more efficient. Storage is separated from compute to allow independent scaling.

Figure 1 – Architecture of IBM Db2 Event Store

The architecture of the product is illustrated in Figure 1. The two main issues will be on performance for ingest and persistence on the one hand, and performance for query processing on the other. In the case of the former, the log is the database. This means that the amount of synchronous processing required is minimised, with other processes occurring asynchronously. The first of these asynchronous (background) steps is to remove any duplicates and this is followed by generating a synopsis (used for range queries) and indexes. After this the data is transformed into Parquet format and compressed. However, this results in multiple small files, which would be inefficient in supporting query processing, so a merge function is used to combine these into larger files. According to IBM these background processes typically take under a minute from ingest to persistence but the company is hoping to reduce this to a matter of seconds in forthcoming releases.

As far as query processing is concerned, a lot of the performance offered by Db2 Event Store is based on the features already mentioned, including the synopses, indexes and in-memory processing. Parallel processing and tiered caching (including CPU caching, a feature of Db2 BLU) also support superior performance, as does use of the Db2 Optimiser. A notable feature of this, planned for release later in 2020, is that IBM is introducing machine learning into the optimiser in order to improve query planning. This is already available as an optional feature within other members of the Db2 family.

Figure 2 – IBM Data Virtualization Computational Mesh

Apart from its integration with IBM Streams, Db2 Event Store will also benefit from the Data Virtualization capability within IBM Cloud Pak for Data. This is both innovative and ahead of other offerings in the market. The key difference from traditional approaches to data federation/virtualisation is that IBM uses a computational mesh – see Figure 2 – that not only performs analytics locally but also within a local constellation. Given that moving the data across the network is the biggest issue with traditional data federation techniques this should significantly improve performance. Moreover, this should have a significant impact on costs because you need less infrastructure to get the same (or better) performance. Data sources supported by the computational mesh include the Db2 family, Netezza, BigSQL, Informix, Derby, Oracle, SQL Server, MySQL, PostgreSQL, Hive, Impala, Excel, CSV and text files, MongoDB, SAP HANA, SAS, MariaDB, CouchDB, Cloudant, various Amazon and Azure databases, sundry streaming products (including Kafka), multiple mainframe environments (IBM and others), generic JDBC access and several third-party data warehousing databases. Note that Data Virtualization integrates with the Enterprise Data Catalog, which also forms part of IBM Cloud Pak for Data.

Why should you care?

There have been various attempts to combine the analysis of both (near) real-time and historical data within a single environment and, as we have discussed, Db2 Event Store provides a much less complex solution to this issue compared to other available approaches. More generally, the implementation of Db2 Event Store along with IBM Cloud Pak for Data brings multiple benefits, not least the data virtualisation provided. We also particularly like the fact that there is a common SQL engine across the Db2 family. Moreover, this is ANSI 2016 compliant compared to other vendors that are sometimes still working with versions of SQL from the last century!

The Bottom Line

We like Db2 Event Store a lot. If you want to combine up-to-the-moment data with historical data for analysis purposes, it should certainly be on your short list. As part of IBM Cloud Pak for Data, the complementary capabilities provided therein, make the offering compelling.

Related Company

IBM

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community