How Data Domain de-dupe is upping its nearline capabilities

Written By: Peter Williams
Published:
Content Copyright © 2008 Bloor. All Rights Reserved.

Storage de-duplication
has the potential to be used in lots of situations—and de-dupe specialist
Data Domain is having to work hard to prioritise provision of new features from
the opportunities it is seeing.

The starting point
is using its NAS-style de-duplication storage appliances which can be installed with minimum
disruption to an organisation’s existing way of working. This means that, for
instance, it carries out an in-line de-dupe transparently within an unchanged
backup procedure. The company says this will typically achieve an immediate 20x
backup disk saving and requires no management.

So my question is:
“Why wouldn’t you?” Yes, you have to pay for the de-dupe appliance but the massive
disk capacity savings achieved means avoiding future disk drive purchases. In
turn this can, for instance, greatly defer the day when your data centre runs
out of capacity (space, energy) so it also fits well with a green IT policy.

Data Domain also
uses this de-dupe process for a virtual tape library (VTL). The huge disk
capacity saving means data can be economically retained on disk—nearline storage—for, perhaps, months before there is a need for it to go into deep tape (or
optical) archive. In the meantime it is much more rapidly recoverable and
accessible. With the data taking, say, 1/20th the capacity on low
cost SATA disk compared with ‘un-deduped’ tape, the economics of disk versus
tape is radically altered in disk’s favour.

In both cases the
data is accessible reasonably fast, so it provides a nearline tier which can be
accessed directly for many applications; for instance Data Domain has
partnerships with a couple of content search engine providers. Storage content searches
are useful as input to discovery as evidence for a compliance court case.

A new Data Domain
feature is Retention Lock; this can set a lock on individual files as they are
archived so that they cannot be changed in any way for a pre-set period. Since
this is open for the IT manager to set or change it is not suited to rigorous SEC-level
compliance, but helps ensure good governance since it will firmly block user
access. The company also uses a partner to provide encryption. Together these
steps show Data Domain making at least tentative moves into accommodating governance,
risk and compliance (GRC) needs. A data destruction verifiable delete facility
is also planned this year.

In fact de-dupe is
equally at home with archiving as with backup, although the nature of archiving
means the space saving of, typically 75–80% or 4x, is much lower than for
backup; but it’s still impressive. Moreover, the process is also helping remove
the demarcation between backup and archive systems which, at least longer term,
should help simplify the management process.

Further ways this
is supported is that sending either a backup or archive copy to a remote
location, even travelling over a WAN, is practical. Now add a frequent snapshot
capability which sends hardly any data as it only needs to store data tags, and
you nearly have continuous data
protection (CDP) and a very low-cost disaster
recovery (DR) solution. You also obviate any need to physically transport newly-created
tapes to a remote secure location—by sending the information over the wire.

All these are
possible only because the specially-designed appliance, which draws heavily on
CPU performance, achieves the necessary throughput to carry out block- and
byte-level de-dupe in-line as the data is received. Any vendor providing only a
software solution cannot achieve this throughput—and building an optimised appliance
is not an overnight job. The alternative, so-called ‘post-processing’ de-dupe
that only works on the already backed-up storage, has very little value in my
book, as it needs to allocate more disk
space and incurs extra management.

So, notwithstanding
the economic downturn and with storage volumes set to continue soaring, Data Domain
looks to be sitting pretty right now.

What of the
future? Clearly, since applications can already access de-duped nearline
storage in real time, there are few technical reasons stopping de-dupe being
applied to tier one (even tier zero) storage and saving yet more space—except
in considering when to accomplish the de-dupe. (No immediate plans for this I’m
told.) What I do know is that Data Domain’s own users are thinking outside the (storage)
box to pass on their ideas—so some highly original future developments are
entirely possible.