In the process of losing its bidding battle with EMC for de-duplication market leader Data Domain, Network Appliance (NetApp) has exposed its weakness in the de-duplication (de-dupe) market sector. It had previously developed its own technology (A-SIS) but evidently accepted that Data Domain provided a better bet.
Data Domain preferred NetApp, who seemed to have the deal sewn up with an accepted bid of $1.5 billion; then EMC made a hostile bid of $1.8 billion and, when NetApp matched this, EMC went higher. NetApp was undoubtedly prudent to walk away at that point. Perhaps it did not do so badly (and I am not referring to it receiving a $57 million break-up fee from Data Domain, so from EMC).
The de-dupe market is far more complex than simply saying "Data Domain is the market leader so the best." There is actually quite a choice of solutions suited to different situations.
Data Domain's approach is fast and efficient, as well as intuitive in that the appliance sits in the data stream and de-dupes ‘in-line' as the data is received with no delay to the output. So it is free-standing and basically plug in and go to get immediate results. Yet this hardly scratches the surface of the evolving market.
Quantum's appliances compete directly with Data Domain's and have been offered by EMC (so are likely to be dropped over time). Meanwhile, Falconstor's virtual tape library (VTL) solution with de-duplication competes well against Data Domain's VTL option.
ExaGrid and Sepaton are two of the independent vendors providing post-process de-duplication—the compression carried out after the back-up as this is not so performance-critical as in-line. ExaGrid's approach looks like in-line to the user at it backs up onto its own appliance ‘in-line' and immediately de-dupes it out to the destination.
Both also cluster their appliances to gain greater scalability. (Data Domain is expected to add this capability in due course but is not there today.) Sepaton's approach is also geared to specific back-up application types so can gain greater compression for some formats by recognising and removing the applications' headers and data markers from the data stream.
Ocarina (also post-process) is unique in using content-aware compression using algorithms which can de-dupe already compressed JPEG and MPEG file formats; none of the others so far make any impression on these formats—a problem that is increasing with graphics and video files becoming ever more common.
Then there is CommVault's Simpana which uses a more global approach, embedding de-dupe in all back-ups, remote replication and archiving—and so far the only vendor providing de-dupe even for archive tape. NetApp itself was the first to offer de-dupe for primary data with very little performance overhead. However, I can understand some nervousness about playing with the integrity of primary files as distinct from backups.
From a legal and security standpoint, there are a couple of basic de-dupe issues. One cannot de-dupe encrypted data—but leaving it unencrypted in order to de-dupe it obviously makes it more vulnerable to hack attacks. Then, fairly obviously, de-dupe systems need to tamper with the stored data; yet some legal cases hinge on the ‘real evidential weight' of the stored information so the tampering could in theory be used to swing a case. So, de-dupe needs careful consideration by those organisations for whom security or legal concerns are critical.
Finally, to me, there is anyway a joker in the pack that may be played going forward. Earlier this month I wrote about companies specialising in IT infrastructure optimisation including WAN optimisation—and it is no surprise that they use various advanced single instancing (SI) and de-dupe techniques; some of these will reduce the size even of an already de-duped back-up copy.
So, one argument looking to the future goes: "Who needs de-duplication appliances at all when WAN optimisation has even better technology built in?" This, of course, assumes the cost-benefit of installing such optimisation software and equipment would be greater with de-duplication then providing little or no extra benefit. (In time that might become the case but it is not so just yet.)
NetApp clearly has a few alternative companies it could go for—or partner with—assuming it does not want to opt for further development of its own technology. Right now these may seem like second choices but are actually just alternative ways of skinning a cat so to speak—with some of them very sound.
NetApp will no doubt think carefully about its strategy so could yet turn this into a success. Whether EMC proves to be a good for Data Domain is another matter.