Storage de-duplication has the potential to be used in lots of situations—and de-dupe specialist Data Domain is having to work hard to prioritise provision of new features from the opportunities it is seeing.

The starting point is using its NAS-style de-duplication storage appliances which can be installed with minimum disruption to an organisation's existing way of working. This means that, for instance, it carries out an in-line de-dupe transparently within an unchanged backup procedure. The company says this will typically achieve an immediate 20x backup disk saving and requires no management.

So my question is: "Why wouldn't you?" Yes, you have to pay for the de-dupe appliance but the massive disk capacity savings achieved means avoiding future disk drive purchases. In turn this can, for instance, greatly defer the day when your data centre runs out of capacity (space, energy) so it also fits well with a green IT policy.

Data Domain also uses this de-dupe process for a virtual tape library (VTL). The huge disk capacity saving means data can be economically retained on disk—nearline storage—for, perhaps, months before there is a need for it to go into deep tape (or optical) archive. In the meantime it is much more rapidly recoverable and accessible. With the data taking, say, 1/20th the capacity on low cost SATA disk compared with ‘un-deduped’ tape, the economics of disk versus tape is radically altered in disk's favour.

In both cases the data is accessible reasonably fast, so it provides a nearline tier which can be accessed directly for many applications; for instance Data Domain has partnerships with a couple of content search engine providers. Storage content searches are useful as input to discovery as evidence for a compliance court case.

A new Data Domain feature is Retention Lock; this can set a lock on individual files as they are archived so that they cannot be changed in any way for a pre-set period. Since this is open for the IT manager to set or change it is not suited to rigorous SEC-level compliance, but helps ensure good governance since it will firmly block user access. The company also uses a partner to provide encryption. Together these steps show Data Domain making at least tentative moves into accommodating governance, risk and compliance (GRC) needs. A data destruction verifiable delete facility is also planned this year.

In fact de-dupe is equally at home with archiving as with backup, although the nature of archiving means the space saving of, typically 75–80% or 4x, is much lower than for backup; but it's still impressive. Moreover, the process is also helping remove the demarcation between backup and archive systems which, at least longer term, should help simplify the management process.

Further ways this is supported is that sending either a backup or archive copy to a remote location, even travelling over a WAN, is practical. Now add a frequent snapshot capability which sends hardly any data as it only needs to store data tags, and you nearly have continuous data protection (CDP) and a very low-cost disaster recovery (DR) solution. You also obviate any need to physically transport newly-created tapes to a remote secure location—by sending the information over the wire.

All these are possible only because the specially-designed appliance, which draws heavily on CPU performance, achieves the necessary throughput to carry out block- and byte-level de-dupe in-line as the data is received. Any vendor providing only a software solution cannot achieve this throughput—and building an optimised appliance is not an overnight job. The alternative, so-called ‘post-processing’ de-dupe that only works on the already backed-up storage, has very little value in my book, as it needs to allocate more disk space and incurs extra management.

So, notwithstanding the economic downturn and with storage volumes set to continue soaring, Data Domain looks to be sitting pretty right now.

What of the future? Clearly, since applications can already access de-duped nearline storage in real time, there are few technical reasons stopping de-dupe being applied to tier one (even tier zero) storage and saving yet more space—except in considering when to accomplish the de-dupe. (No immediate plans for this I'm told.) What I do know is that Data Domain's own users are thinking outside the (storage) box to pass on their ideas—so some highly original future developments are entirely possible.

