IBM Cloud Pak for Data

Update solution on March 3, 2021

What is it?

IBM Cloud Pak for Data is a cloud-native (microservices, Kubernetes and so forth) data management platform that places particular emphasis on deploying, developing and managing AI and machine learning models but also encompasses more general-purpose data management. The platform consists of a large number of IBM services (some of which are included as part of the base product, while others must be licensed separately) that integrate together and are accessible through a singular interface. It can be deployed to the most popular public clouds – AWS, Microsoft Azure, and Google Cloud – as well as to IBM Cloud and to hyper-converged systems. In addition, it features an open architecture built on Red Hat OpenShift, which is shown in Figure 1.

This is a slightly misleading diagram since although it mentions event ingestion and data transformation there is no mention of traditional data integration capabilities, which are provided via InfoSphere DataStage for Cloud Pak for Data. Similarly, although there is mention of data quality and classification, and of policies and rules, there is no mention of data governance, compliance or how to discover and manage sensitive data, even though these are all available with, for example, Watson Knowledge Catalog providing data masking capabilities. Again, while data cataloguing is mentioned metadata management is not, despite IBM being a major driver behind the open source ODPi Egeria project.

More generally, the services offered by the Cloud Pak for Data range across a number of spaces, including analytics (in a variety of types, such as prescriptive, predictive, streaming, big data and so on), databases and data warehousing, dashboarding, data virtualisation, data integration, data cataloguing, data governance, and, of course, AI and machine learning. Particularly notable products and services include Cognos Analytics, IBM’s premiere self-service analytics and business intelligence platform; Watson Machine Learning, Watson Studio and Watson Knowledge Studio, all of which support model creation, training, deployment and management in one way or another; Watson OpenScale, which adds bias detection and model explainability to the platform’s AI capabilities; Watson Knowledge Catalog, for data governance and cataloguing; InfoSphere DataStage, for ETL and data integration; and several use case specific AI products, such as Watson Assistant for conversational AI, Watson Discovery for AI-enabled enterprise search, Watson Language Translator for AI-driven translation, and so on. Note that these are the headline products: a large number of less glamorous – though not necessarily less useful – services are also provided, such as databases (Db2, Db2 Event Store, Netezza), developer tools and product integrations.

Customer Quotes

“IBM Cloud Pak for Data enabled Sprint to digest high volumes of data for near, real-time ML/AI analysis, and the trial results have shown potential to take Sprint to the next phase of digital transformation.”
Sprint

“One of the great things about the Cloud Pak for Data System is the speed with which we’ll be able to launch and scale our analytics platform. The integrated stack contains what we need to improve data quality, catalog our data assets, enable data collaboration, and build/operationalize data sciences. We’re able to move quickly with design, test, build and deployment of new models and analytical applications.”
Associated Bank

What does it do?

Cloud Pak for Data offers a single, unified platform for in-cloud data management that is itself built around a single, unified data catalogue: namely, Watson Knowledge Catalog. This setup is in turn supported by a robust layer of governed data virtualisation, as seen in Figure 2, and powered by the appropriately named Data Virtualization. Data Virtualization layers what IBM describes as a

computational mesh (which is automated, self-balancing, and scalable) over your data sources. This is, in our opinion, the most advanced technology for data virtualisation currently available on the market. Importantly, Data Virtualization is in constant communication with the Watson Knowledge Catalog, and in many ways enables it to act as it does, as a unified repository for all of your data. Creating a unified view of all of your data in a single location has obvious advantages, but for the purposes of Cloud Pak for Data, one of the most notable is that it makes it much easier to leverage the full range of your data within your AI models.

On the subject of AI, note that a number of pre-built AI apps are available within several Cloud Pak for Data services. Also note that one of the major (and most time consuming) difficulties in developing your AI models comes from discovering, understanding, and preparing the data that they rely on. With this in mind, Cloud Pak for Data has a significant DataOps capability that adds automation to your data pipelines. This should increase the speed of those pipelines and thus shorten the process of delivering that data, ultimately enabling more efficient model development. Automation is not only present within the data preparation step, however. To wit, Watson Studio also offers AutoAI, a feature that actively and intelligently automates the model creation process itself: IBM describes this as “AI automating AI”.

The enforcement of governance rules and policies can be automated as well, via Watson Knowledge Catalog. Not only that, but bodies of knowledge provided by both your organisation and by IBM can be baked into your catalogue, thus centralising and exposing your own tribal expertise as well as broader industry and regulatory knowledge, most notably compliance mandates such as GDPR and CCPA. Watson Knowledge Catalog also features automated profiling and classification, as well as AI-driven tagging recommendations.

Why should you care?

The major reason to care about Cloud Pak for Data is that it offers a broad swathe of data management functionality within a single, unified platform. Offering an end-to-end and interoperable data management stack, while not unique to Cloud Pak for Data, is a significant advantage that serves to distinguish it from many other data management vendors. In addition, Cloud Pak enables a gradual, at your own pace, pathway to migrate from an on-premises environment to a multi and/or hybrid cloud environment.

The Bottom Line

IBM Cloud Pak for Data is a broad, cloud-native, and highly integrated and interoperable data management platform which is particularly good at enabling your own AI and machine learning efforts.

Related Company

IBM

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community