Data Domain’s souped-up OS to dent ExaGrid’s de-dupe performance challenge?

Written By: Peter Williams
Published:
Content Copyright © 2009 Bloor. All Rights Reserved.

De-duplication (de-dupe) appliance market leader Data
Domain today releases new platform operating system software for its appliances
which it says will boost performance by 50–100%. A simple software upgrade to the
new DD OS version 4.6 is available to users at no extra charge.

This is an important development for Data Domain. For,
while it has been busy pushing up into the enterprise market, it has seen increasing
SMB competition especially from ExaGrid, now rated number two in terms of
de-dupe users, whose appliance has outperformed Data Domain’s in some live
tests. While their approaches are different, they appear much the
same to the user.

Data Domain gave an example of the potential speed
improvement. Using its top-of-the-range DD690 system alongside Symantec
NetBackup OpenStorage (OST) on a 10Gb Ethernet line, it now claims backup
throughput of up to 760MB/second or 2.7TB/hour—or 90% faster than the DD690’s
best throughput when it was introduced in May last year.

Data Domain
was a little coy on how it had achieved such an increase without any change of
hardware. Brian Biles, Data Domain’s VP of Product Management, paid tribute to its
proprietary Stream Informed Segment Layout™ (SISL) technology which is CPU-centric
and software-based; so this has to have been entirely down to software
improvements.

Data Domain’s appliances carry out de-dupe ‘in-line’ with
data backup, meaning that they convert the data as received, so speed of
throughput is especially important; the de-dupe process has to keep pace with
the data it receives or the backup will slow down—going contrary to shrinking
backup windows. Equally critical is the speed of the ‘un-de-dupe’ restore
process.

“Data
Domain continues to bypass the disk I-O bottleneck and instead rides the CPU
price/performance curve,” said Biles in a reference to the way SISL boosts
performance every time the number of CPU cores increases (although these have
not changed). “This announcement reconfirms the power of our optimised in-line
de-duplication approach.”

Most competitors do not attempt ‘in-line’. They wait for
the backup to complete before starting the de-dupe process, carrying it out
on the completed backup afterwards (‘post-process’ de-dupe). This approach avoids
any slowing of the backup but requires extra ‘interim’ disk space and takes
longer overall before it gets to the tiny footprint of the de-duped backup.
Data Domain’s approach is more intuitive and is installed to run transparently
to the existing way of working.

ExaGrid uses a hybrid approach. Its appliance sits
‘in-line’ to the backup but is effectively divided in half, doing an internal ‘post-process’ de-dupe. One
half captures the backup data straight onto its internal disk using its grid
architecture which makes this very fast. Only then does it start de-duping it, outputting
the data so that only the de-duped data arrives at the destination system (as per
‘in-line’). If there are a series of backups the first can de-dupe while the
next is received and, overall, the backup may out-perform straight ‘in-line’ as
per Data Domain.

This approach also has a post-process advantage. When an
error occurs that requires urgent recovery from the most recent back up—it is
usually this that needs recovering—it can restore from the backup still sitting
inside the appliance without the overhead of having to ‘un-de-dupe’ it.

To the user, the only visible difference between Data
Domain and ExaGrid may be performance. Since most de-dupe approaches in the
open systems market build up to an average of around 95% space saved on a full
backup output, this may be the key differentiator. Against other vendors Data
Domain and ExaGrid almost certainly lead in terms of ease-of-use (both
transparent) and may debate the pros and cons of their respective solutions’
reliability and scalability. But Data Domain’s announcement markedly
improves its performance competitiveness.

Both companies have excellent technologies and will
continue to advance, spurred on partly by each other. They also operate in
different if overlapping parts of the market (for instance, Data Domain
provides a virtual tape library (VTL) solution with de-dupe, as demanded by many
enterprises).

In these cash-strapped times, backup de-dupe is an
obvious source of operating cost savings and provides a quick ROI—so both
companies should continue to thrive.