Cloud Management

Last Updated: 21st June 2024
Analyst Coverage: Paul Bevan, David Norfolk and David Terrar

What is it?

The boundaries of what we view, broadly, as the Cloud Management market have increased. It encompasses the performance of private, public and hybrid-cloud infrastructure as well as traditional on-premises configurations that, increasingly, are virtualised and cloud-like. It also includes a strong focus on cloud cost optimization. Cloud Management enables IT organisations to deliver relevant, timely, performance, health, cost and resource utilisation metrics across all levels of IT and the Business. It includes elements of Digital Experience Management (DEM) and IT Service Management (ITSM), as well as the more traditional Data Centre, Networking and Application Performance Management disciplines. Changing development and deployment models, characterised by DevOps and Containerisation, require management and monitoring tools to provide observability into, and tools for, new development environments.

Why is it important?

Business relies on IT for the delivery of its customer propositions. Indeed, for many, IT is the business. The growth and scale of cloud computing, the agility provided using micro-services and new deployment technologies, the power of business analytics (BI) and the immediacy of social media has brought forth new, digital-only, business models that operate at a global level, creating new levels of expectation and scale for IT performance in old and new businesses alike.

In this environment, customer experience of the application is critically important. Instrumentation across all key applications and infrastructure is the only way to measure and ensure end to end performance and availability. Legacy, silo-specific monitoring tools are no longer adequate as they can’t communicate or relate to one another and have no understanding of the applications that are running on their components. Application Performance Management tools (APM) alone can’t ensure performance and frequently can’t identify the root cause of performance degradation, especially those rooted in some part of the I/O path, such as the network or storage infrastructure. With the Internet of Things (IoT), the collaborative and global nature of business, and the specific hardware demands of Artificial Intelligence (AI), Machine Learning and Data Science, that infrastructure is getting more and more complex.

Therefore, being able to monitor and react to performance issues in near real-time and being aware of what parts of this complex IT environment are being used by individual applications is a critical business requirement.

There is also a very important side-effect of a comprehensive Cloud Management system; the ability to benchmark and profile performance characteristics of applications under different workloads and in different environments. This can then be used to develop sophisticated capacity planning models to help reduce the instance of expensive over-provisioning in public and hybrid-cloud environments.

What does it do?

Cloud Management is different from traditional application and network performance management tools in its ability to capture and correlate data from a wide range of sources, across an entire IT hybrid infrastructure, that includes public and private cloud deployments, irrespective of vendor. Cloud Management is comprised of monitoring, cross-domain correlation and AI-based analytics. It captures granular information, in real-time, on transaction flows from storage arrays, from network devices, between server VMs across both on-premises and cloud environments using a combination of hardware and software probes to capture packet and flow data, agents, logs and events.

The hardware probes tap physically into storage and network fibre connections, while software probes capture information from physical and virtualised devices in an agentless, non-intrusive manner. They ingest huge amounts of real-time data which is correlated, analysed and presented to IT operations and business managers in user-defined custom views on a single pane of glass.

While Cloud Management systems should be application-aware, it is important to understand that, in a virtualised and micro-services environment, the end-user might be another micro-service. Cloud Management focuses on understanding the infrastructure associated with those service-to-service transactions as well as the overall application performance that the end-user experiences and is generally categorised now by the term Observability.

With the huge amount of data collected, Cloud Management solutions make use of advanced correlation algorithms, machine learning and predictive analytics to provide sophisticated management information dashboards.

Who should care?

Business leaders need to care because the only real reason to care about infrastructure performance and availability is that it has a significant impact on the business. British Airways and TSB IT problems recently that had financial impacts of £150 million and £50 million respectively in direct costs. While it is by no means clear that infrastructure performance issues were the major factor here, poor performance can have serious, direct implications. 10 years ago, Amazon was already showing that every 100ms of latency cost them 1% in sales, while Google found an extra 0.5 seconds in search page generation time dropped traffic by 20%.

Performance problems may be attributable to a variety of different components within a hybrid Cloud infrastructure, or in applications themselves. Therefore, CIOs, CTOs and IT Operations leaders need to care about the significant problems with siloed operational teams and vendor or technology specific tools. The business does not want to hear “not my problem, try that lot over there”. It wants the business issue fixed, quickly and effectively. It wants the root cause of an issue identified quickly and collaboratively, and to see someone put onto addressing it. It doesn’t want to see developers and technicians playing the “blame game”. Ultimately, it enables IT to reduce Mean Time to Resolution (MTTR), or, as we have jokingly heard it called, “Mean Time to Proving Innocence”.

Emerging trends

In the last few years there has been a growth in tools designed to help companies control their Cloud spend. By and large, these tools focus on one of the big three cloud providers and provide information about cloud usage to help identify where, for example, cloud services are left running when they are not needed or where purchased capacity is not being used effectively. AWS, Azure and GCP do provide their own tools that customers can use to help manage their cloud bills.

The development of hybrid and multi-cloud infrastructures and an increased focus on digital transformation and migrating more legacy applications to the Cloud requires a more sophisticated approach. We are now seeing a move towards tools and platforms that look at performance, cost and risk across multiple clouds, with capabilities in understanding how legacy workloads would run in the cloud and the ability to factor in risk when managing cloud capacity, as well as identifying wasted spend etc. This is giving rise to the term FinOps which we are focusing on in more depth

We are also seeing an emerging trend where Observability tools, which originated to help DevOps teams and Site Reliability Engineers (SRE) overcome some of the visibility challenges inherent in containerised micro-services deployments in a public cloud environment, into a broader set of operational performance monitoring capabilities. This is involving a greater focus on DEM and the use of synthetic data to drive active testing to compensate for gaps in visibility caused by the increasing use of public networks outside the direct control of in-house IT operations teams.

Vendor landscape

Unsurprisingly the three largest Public Cloud providers have been slow in providing observability solutions that enable IT Operations teams to gain single end-to-end visibility of application and business services performance in genuine hybrid and multi-cloud architectures. Microsoft Azure and GCP have started to open up their management tools to allow monitoring of hybrid-cloud applications running on their own cloud and in the customer’s data centres. AWS has been the most reluctant to open up its management tools. However, there were some fairly quiet announcements at Re:invent in 2019 about AWS CloudFormation and AWS Config being opened up to support resources outside the AWS Cloud.

Traditional vendors like IBM, BMC and Broadcom have a genuine ability to monitor -performance across all the major Cloud providers and a much wider range of on-premises systems. We should also note here that there is a very active and competitive market for Hybrid and Multi-Cloud Management tools from independent vendors such as Dynatrace, DataDog and Virtana to name but three.

We note that some IT Operations and development departments have built their own monitoring solutions using open source tools like Prometheus and Grafana. However, this would be a non-trivial exercise in any large-scale hybrid or multi-cloud scenario, and we can’t see the point of trying to build your own when so many commercial solutions are available, particularly when some of them, notably IBM, use key elements of open source monitoring and logging tools.

Downloads

Commentary

Keepit – and some thoughts on SaaS data recovery

Teradata’s AI Unlimited: An evolution in large scale AI

State of the Cloud YouTube Series

Get used to a complex hybrid cloud environment

Platform.sh

Deeper and wider

BigPanda’s use of Generative AI points the way to autonomous IT oper...

Wing Cloud – abstracting away DevOps complexity

Wing Cloud – a new approach to managing today’s tech complexit...

BigPanda announces Generative AI capabilities

Cisco Live announcements

When is cloud genuinely a cloud?

Interlink Software update

Packets are back in fashion

Observations from Tech Show London 2023

Applications in highly regulated industries

Virtana shows a renewed sense of focus

Sunlight takes distributed Cloud to the far Edge

IBM AIOps

Cisco helps customers understand risks of a poor digital user experien...

eG Innovations gets Observability

Spirent demonstrates that “More for less, faster” is achievable

Cloud Networking Matters

BigPanda doubles-down on core AIOps functions

Solutions

These organisations are also known to offer solutions:

AWS
BMC
Cloudability
CloudHealth
Cloudsoft
Contino
Dynatrace
Exivity
Flexera
Google
HCL
Interlink
Kubecost
ManageEngine
Microsoft
Neos
OpenText
ScienceLogic
SoftwareOne
Spirent
Splunk
Ternary
Thebes
Timspirit
Virtisant
Voxowave
Wing Cloud
Yotascale

Research

Viavi Observer Platform InBrief (June 2024 cover)

Cloud Management

What is it?

What does it do?

Who should care?

Emerging trends

Vendor landscape

Downloads

Commentary

Solutions

Research

Viavi Observer Platform

Cybersecurity in the cloud - know your threats and know your adversaries

The importance of assuring the digital experience in delivering business services

AI and Generative AI within an Enterprise Information Architecture - Solix and The Operating System for the Enterprise

Data Fabric and the Future of Data Management - Solix Technologies and The Data Layer

TIBCO - The ready-made road to the Edge for enterprise users

Appian

Mimik Technology