Prediction: One day you WILL reduce total storage

I have sometimes asked what some see as a daft question concerning storage: “When is someone going to come up with a solution that will completely stop storage growth rather than just slowing it down a little?” Usually, the response is a deafening silence. But I predict: one day your organisation WILL reduce its total storage.

For some years the growth has been masked and contained by hardware miniaturisation and capacity hikes, virtualisation to increase disk utilisation and so on. These measures have served to put off the day when enterprises are forced to face up to the need to properly identify the information they hold that is of no value to the organisation and so remove it—just in order to contain storage capacity at manageable levels.

But why wait for the eleventh hour? The goal should already be to identify and keep only the information of value to the company. Yet hard-pressed storage managers are often too busy fighting fires to take time out to consider how to go about analysing what information they are holding in order to identify what they can cull.

Stop-gaps only put off the evil day
For now the scenario is: data volumes and storage costs increase, as does the time taken to search for and extract valuable information. This is exacerbated by an increasing requirement for compliance, with new regulations leading to otherwise valueless information needing to be retained. Adding faster and/or higher capacity storage hardware may drag performance back up, but only for a while.

As well as longer search times, there are longer backup times. Continuous data protection (CDP) has recently gained favour partly because it can obviate the need for a storage backup window by effectively spreading the backup throughout the day. Without CDP some companies were staring at the prospect of daily backups taking more than the time available to complete them. Yet CDP does not reduce the total daily workload nor the growing storage requirement.

The reliance on being able to hive off most information to off-line tape libraries has also been undermined by more information needing to be available longer for rapid retrieval, with compliance a major reason. In any case, even tape libraries need security and some environmental control to protect the media and the information it contains; that too has a cost.

The biggest increase in storage is now identified as coming from e-mails, instant messaging and the like. One recent estimate is that 75% of a company’s intellectual property is now held within e-mails; most major litigation in the US now involves disclosure of e-mails as evidence.

Increasingly sophisticated e-mail and messaging storage management systems have managed to hold back the increase in total storage e-mail should cause—by, for instance, holding each message and its attachments once only, with multiple recipients just receiving a tag that links back. Data compression has also helped reduce disk space at the expense of some performance.

Nevertheless, I keep returning to the fact that the total storage pool keeps growing. So, in general, does the time to retrieve any information of value to the organisation. Add more equipment and you add to the running costs, for powering the equipment and cooling the surrounding data centre or office. An increasingly common problem is data centres approaching maximum capacity—meaning they will hit a literal brick wall short of a hugely expensive and disruptive relocation to bigger premises.

Oh and I should mention—though it is still way down most companies’ priority lists—the increasing power generation burns up fossil fuel and so increases atmospheric CO2.

Hardware producers see money in countering power usage. So, for instance, Pillar Data Systems plans to add ‘sleepy’ drives to its Axiom disk arrays in the second half of 2007—SATA disks that will spin slowly when not in use and so save power. They differ from the massive array of idle disks (MAID) products that close the disks right down, which may save more power (dependent on how long is the gap before reuse) but, according to Pillar, shutting right down and fully powering up impacts on drive durability (which means higher replacement costs). These also do nothing to solve the growing storage requirement.

Information classification and policies are the way ahead
Various information lifecycle management (ILM) initiatives have been introduced over recent years. The broad aim of these has been to try and tackle the information value problem while also reducing storage costs. At best they have helped reduce the amount of information held on the most costly tier one disk storage and thereby improved the speed of retrieving the rest while reducing overall running costs to some extent.

ILM is an overused and often misused term. It is a goal, not a single solution. However, there is one core feature gradually emerging and being built into more and more products. It is: automating the information classification process, which means in practice attaching metadata to each piece of information to tell the system what it is, or contains. This overlaps with the work carried out by content management systems (ECM and ICM).

From this metadata, a storage management system can apply corporate policies captured in a policy engine to automate information movement between tiers of storage—and especially to move data out to off-line storage and ultimately destroy it safely and much more quickly. Only by building up—gradually—to applying such techniques across the enterprise will total storage capacity ultimately be contained.

Some of this work is in its infancy and, so far, there are no information classification or taxonomy standards that any products adhere to. Yet products are moving in this direction. (Bloor’s recent paper on ILM provides a full description of the current status of the ILM market, some of the initiatives in progress, and what companies should be doing to get a handle on the information they already have.)

Last week’s release of Symantec’s Enterprise Vault 7.0 is a case in point. This product, designed to manage e-mail storage—and now covering other information types including instant messaging—includes classification and a policy engine for the first time. Broadly speaking, this approach needs to extend to all types of storage information.

What is often forgotten is that, however much information can be retrieved, there is in any case a limit to a business’s capacity to make use of it. The way forward is to tackle head-on the information you have now as well as the new information arriving daily, to hone it down to a slim-line storage pool which only carries data that is absolutely necessary—and making the most valuable information the easiest to retrieve.

Achieving that will variously reduce the costs for power, equipment upgrades, software, management time, maintenance and so on, while also improving performance and the business value of the information.

Common sense says a small investment now is likely to produce a long-term ROI. In any case, sooner or later you will have no choice. Unless you turn the tide and reduce your total storage, your enterprise systems will collapse under the storage burden.