Storage optimisation

This is the first of two articles about storage optimisation. In this article I will discuss what the issue is and in the second I will consider how vendors are (or, mostly, are not) addressing the real problems that users are facing.

Take a simple scenario in which you have three applications running against three separate databases, each of which has its own 1Tb disk. Suppose further that the databases require 700Gb, 150Gb and 600Gb respectively. The 150Gb database used to be a lot bigger but you’ve recently introduced an archival product that has allowed you to significantly reduce its space requirements. It isn’t hard to see that you could move that application to one of the other two disks and so free up one of the disks for some other purpose.

Now multiply these three disks by the thousands of such disks that are likely to be in place in any large organisation. The question becomes, how do you keep track of all the spare capacity you have whenever archival is implemented or data is deleted as no longer being relevant or you de-fragment a disk or you gain spare disk space for any of a number of other reasons?

To answer this question you have to bear in mind that new applications may be being added, that databases are growing and that virtualisation means that resources are being re-used on a regular basis.

Now we need to ask another question, which is why you would need to know about your spare capacity in the first place? The answer is that you want to be able to optimise the capacity you have so that it can be used to best advantage.

That’s only a part of the problem. Knowing about spare capacity and acting on it are two different things. Typically, you need to plan data migration across storage devices: you can’t just move stuff willy-nilly. But, and here is the rub, the process of planning such migrations is usually a manual one, and the process of making such plans can take weeks. By which time all the information you had about your disk usage is out-of-date, so your plans will be only partially useful at best, or obsolete at worst.

However, we are putting the cart before the horse. We are assuming that the information you can gather about your current storage usage is itself up-to-date and the fact is that it almost certainly isn’t. This is because (unless you are a very rare exception) you will undoubtedly have storage devices from multiple vendors and, while each of those vendors probably has pretty decent capabilities for discovery and investigating details pertaining to its own hardware devices, it equally probably has pretty lousy capabilities when it comes to finding out anything useful about competitive hardware. All of which means that you have multiple discovery products, none of which work together, so you end up producing a bunch of reports which you have to collate by manually entering the data from each separate report into (probably) a spreadsheet. Given the amount of data, this can take weeks and is notoriously error-prone; and even if, by some miracle, it was all correct, it would still be out-of-date given the dynamic nature of storage environments.

What’s needed is another way. What’s needed is heterogeneous discovery that will find out about your storage usage regardless of who provided it, so that collation of that data is automatic and timely. Secondly, to take a leaf out of the database world, you will need autonomics that will analyse your existing environment for you, make recommendations as to how you might optimise it and then, once you have selected your preferred option, action it (with, of course, appropriate scheduling and so forth) for you. Put those two things together—heterogeneous discovery with autonomics—and you have the beginnings of a solution to the storage optimisation problem.

However, that’s just a baseline: any such solution would need to be able to be aware of not just storage devices but servers and any virtualisation software that was in place and how it was being used. In other words, it needs to be able to understand the entire environment and how it pertains to storage. Further, you don’t just implement storage optimisation against a blank sheet of paper: there will be corporate policies and governance principles in place, as well as service level agreements and the like, so you will need a solution that is aware of, and can cater for, these aspects of operations management. What’s more, you need a solution that can execute your chosen recommendation, which means automatically migrating data across heterogeneous devices and servers. In other words, you need heterogeneity at the front-end and the back-end as well as some clever analytics in the middle.

Nor is this the end of the matter because this isn’t a one-off process. You don’t just optimise your assets once and say that’s job done. Optimisation is an on-going process. Fortunately, it’s easier to manage in that way: once you start from an optimised status you monitor changes as they occur and iteratively tune the environment to ensure it remains in an optimised state.

So, that’s the issue with storage optimisation and that’s what you would like to be able to do and some of the major considerations that would apply to any solution. I’ll discuss the market landscape in my next article.