De-duplication (de-dupe) appliance market leader Data Domain today releases new platform operating system software for its appliances which it says will boost performance by 50–100%. A simple software upgrade to the new DD OS version 4.6 is available to users at no extra charge.
This is an important development for Data Domain. For, while it has been busy pushing up into the enterprise market, it has seen increasing SMB competition especially from ExaGrid, now rated number two in terms of de-dupe users, whose appliance has outperformed Data Domain's in some live tests. While their approaches are different, they appear much the same to the user.
Data Domain gave an example of the potential speed improvement. Using its top-of-the-range DD690 system alongside Symantec NetBackup OpenStorage (OST) on a 10Gb Ethernet line, it now claims backup throughput of up to 760MB/second or 2.7TB/hour—or 90% faster than the DD690's best throughput when it was introduced in May last year.
Data Domain was a little coy on how it had achieved such an increase without any change of hardware. Brian Biles, Data Domain's VP of Product Management, paid tribute to its proprietary Stream Informed Segment Layout™ (SISL) technology which is CPU-centric and software-based; so this has to have been entirely down to software improvements.
Data Domain's appliances carry out de-dupe 'in-line' with data backup, meaning that they convert the data as received, so speed of throughput is especially important; the de-dupe process has to keep pace with the data it receives or the backup will slow down—going contrary to shrinking backup windows. Equally critical is the speed of the 'un-de-dupe' restore process.
"Data Domain continues to bypass the disk I-O bottleneck and instead rides the CPU price/performance curve," said Biles in a reference to the way SISL boosts performance every time the number of CPU cores increases (although these have not changed). "This announcement reconfirms the power of our optimised in-line de-duplication approach."
Most competitors do not attempt 'in-line'. They wait for the backup to complete before starting the de-dupe process, carrying it out on the completed backup afterwards ('post-process' de-dupe). This approach avoids any slowing of the backup but requires extra 'interim' disk space and takes longer overall before it gets to the tiny footprint of the de-duped backup. Data Domain's approach is more intuitive and is installed to run transparently to the existing way of working.
ExaGrid uses a hybrid approach. Its appliance sits 'in-line' to the backup but is effectively divided in half, doing an internal 'post-process' de-dupe. One half captures the backup data straight onto its internal disk using its grid architecture which makes this very fast. Only then does it start de-duping it, outputting the data so that only the de-duped data arrives at the destination system (as per 'in-line'). If there are a series of backups the first can de-dupe while the next is received and, overall, the backup may out-perform straight 'in-line' as per Data Domain.
This approach also has a post-process advantage. When an error occurs that requires urgent recovery from the most recent back up—it is usually this that needs recovering—it can restore from the backup still sitting inside the appliance without the overhead of having to 'un-de-dupe' it.
To the user, the only visible difference between Data Domain and ExaGrid may be performance. Since most de-dupe approaches in the open systems market build up to an average of around 95% space saved on a full backup output, this may be the key differentiator. Against other vendors Data Domain and ExaGrid almost certainly lead in terms of ease-of-use (both transparent) and may debate the pros and cons of their respective solutions' reliability and scalability. But Data Domain's announcement markedly improves its performance competitiveness.
Both companies have excellent technologies and will continue to advance, spurred on partly by each other. They also operate in different if overlapping parts of the market (for instance, Data Domain provides a virtual tape library (VTL) solution with de-dupe, as demanded by many enterprises).
In these cash-strapped times, backup de-dupe is an obvious source of operating cost savings and provides a quick ROI—so both companies should continue to thrive.