Hybrid Infrastructure Management
Analyst Coverage: Paul Bevan
Hybrid Infrastructure Management (HIM) is the practice of monitoring and analysing all your IT infrastructure: servers, storage and networks in a manner that relates to their impact on the performance and availability of your business applications. This encompasses the performance of private, public and hybrid-cloud infrastructure as well as traditional on-premises configurations. HIM enables IT organisations to deliver relevant, timely, performance, health and resource utilisation metrics across all levels of IT and the Business.
Why is it important (hot)?
Business relies on IT for the delivery of its customer propositions. Indeed, for many, IT is the business. The growth and scale of cloud computing, the agility provided using micro-services and new deployment technologies, the power of business analytics (BI) and the immediacy of social media has brought forth new, digital-only, business models that operate at a global level, creating new levels of expectation and scale for IT performance in old and new businesses alike.
In this environment, customer experience of the application is critically important. Instrumentation across all key applications and infrastructure is the only way to measure and ensure end to end performance and availability. Legacy, silo-specific monitoring tools are no longer adequate as they can’t communicate or relate to one another and have no understanding of the applications that are running on their components. Application Performance Management tools (APM) alone can’t ensure performance and frequently can’t identify the root cause of performance degradation, especially those rooted in some part of the I/O path, such as the network or storage infrastructure. With the Internet of Things (IoT), the collaborative and global nature of business, and the specific hardware demands of Artificial Intelligence (AI), Machine Learning and Data Science, that infrastructure is getting more and more complex.
Therefore, being able to monitor and react to performance issues in near real-time and being aware of what parts of this complex IT environment are being used by individual applications is a critical business requirement.
There is also a very important side-effect of a comprehensive HIM system; the ability to benchmark and profile performance characteristics of applications under different workloads and in different environments. This can then be used to develop sophisticated capacity planning models to help reduce the instance of expensive over-provisioning in public and hybrid-cloud environments.
How does it work?
Hybrid Infrastructure Management is different from traditional application and network performance management tools in its ability to capture and correlate low-level wire and machine data across an entire IT infrastructure, irrespective of vendor. HIM is comprised of monitoring, cross-domain correlation and AI-based analytics. It captures granular information, in real-time, on transaction flows from storage arrays, from network devices, between server VMs across both on-premises and cloud environments using a combination of hardware and software probes.
The hardware probes tap physically into storage and network fibre connections, while software probes capture information from physical and virtualised devices in an agentless, non-intrusive manner. They ingest huge amounts of real-time data which is correlated, analysed and presented to IT operations and business managers in user-defined custom views on a single pane of glass.
While HIM systems should be application-aware, it is important to understand that, in a virtualised and micro-services environment, the end-user might be another micro-service. HIM focuses on understanding the infrastructure associated with those service to service transactions as well as the overall application performance that the end-user experiences.
With the huge amount of data collected, HIM solutions make use of advanced correlation algorithms, machine learning and predictive analytics to provide sophisticated management information dashboards.
The only real reason to care about infrastructure performance and availability is that they have a significant impact on the business. British Airways and TSB IT problems recently that had financial impacts of £150 million and £50 million respectively in direct costs. While it is by no means clear that infrastructure performance issues were the major factor here, poor performance can have serious, direct implications. 10 years ago, Amazon was already showing that every 100ms of latency cost them 1% in sales, while Google found an extra 0.5 seconds in search page generation time dropped traffic by 20%.
Performance problems may be attributable to a variety of different components within the IT infrastructure, or in applications themselves. Therefore, you need to care about the significant problems with siloed operational teams and vendor or technology specific tools. The business does not want to hear “not my problem, try that lot over there”. It wants the business issue fixed, quickly and effectively. It wants the root cause of an issue identified quickly and collaboratively, and to see someone put onto addressing it. It doesn’t want to see developers and technicians playing the “blame game”. Ultimately, it enables IT to reduce Mean Time to Resolution (MTTR), or, as we have jokingly heard it called, “Mean Time to Proving Innocence”.
“Ten years ago, Amazon was already showing that every 100ms of latency cost them 1% in sales.”
“Google found an extra 0.5 seconds in search page generation time dropped traffic by 20%.”
“Ultimately, it enables IT to reduce Mean Time to Resolution (MTTR), or, as we have jokingly heard it called, “Mean Time to Proving Innocence.””
CIOs need to get clear agreement on business performance service levels that help build customer trust. For example, an alternative payments company doesn’t worry about one transaction being slow, it worries that once a customer uses a credit card, because the alternative payment is slow, they stick with using credit cards – and perhaps influence their friends to do the same. In such circumstances, the company won’t be concerned about finding someone – or a department – to blame. It will want to inculcate a network of TRUST through the stack from customer to infrastructure provider that ensures teams are collaborating on solving the business issue.
In parallel, an evaluation of current performance monitoring tools needs to be carried out. HIM solutions can replace the need for many legacy monitoring tools. Vendor-specific tools might be needed for a deeper dive where the infrastructure management dashboard has indicated that a particular problem is centred around disk contention in one specific application. However, the aim should be to find a comprehensive, vendor-independent HIM solution, with a wide range of integrations and published APIs that help to minimise the number of tools and ensure that the performance, health and utilisation of the whole I.T. Infrastructure can be viewed through a single pane of glass.
The Bottom Line
Emerging Hybrid Infrastructure Management solutions will enable I.T. to deliver customer-focused service levels that manage both availability and performance, even on virtualised and cloud-based infrastructure that may not be directly under its control.