The problem with archiving

Written By:
Content Copyright © 2005 Bloor. All Rights Reserved.

To use the immortal words of Sellar and Yeatman, archival is a “good thing”. By taking data off your front-end systems you improve the performance of said systems and reduce the need to upgrade them. You can also reduce costs by storing the data on less expensive disk or in near-line systems, or you can store data off-line altogether. Further, you get compliance more or less built in, subject to appropriate auditing software.

However, there is a problem with archival systems. Actually, from a personal perspective there are two problems: the first being that archival falls squarely between data management (my practice area) and systems/storage management which is headed up by my colleague Tony Lock. But you’re not interested in that. Nevertheless it is symptomatic of the wider problem of archival, which is that there is (as far as I can tell from my data management perspective) no single solution that will do all that you want it to do.

To understand what I mean, consider the following: Sand, with its Archive Server offering, offers the best compression rates (by a distance) of any vendor for structured data on near-line storage. However, it is focused specifically on archival from a data warehouse. Now think about AXS-One: this company is a leading supplier of compliance and archival for e-mail and other documents and records—it has special features for searching its archive, including features such as case management (for legal eagles). And finally: OuterBay, which is focused on archival for front-office data and which provides features that, for example, automatically preserve transactional integrity across the archived environment.

Each of these solutions could reasonably be considered best-of-breed in its own space but none of them provides a complete solution. It would, of course, be possible to implement all three as they could be regarded as complementary. However, that in itself could be a problem as you would have a different bulk loader for each product, different enquiry and search mechanisms, different levels of compliance and audit reporting, multiple policy and retention managers (two actually, Sand does not have one). In these days of service oriented architectures you don’t want two or three of anything—you just want one.

So much for the bad news, the good news is that these products are all developing. The most recent to announce a major new release is OuterBay. This includes a number of major new features as well as a re-packaging into two editions: an Enterprise Edition and a Compliance Edition. As far as features are concerned, the most significant are the new bulk loader, which the company tells me is five times faster at loading data; the new compliance capabilities, which provide data lineage based on recording every movement of data from one media to another (who did it, when, what and so on, every time you load or unload); and a new policy manager capability.

Personally, I don’t see how you can have a decent archival solution (in any environment) without some sort of policy management. There are really two elements to this: the first is the ability to identify data (for example, information that hasn’t been accessed for x months) that should be archived, and the second is the actual management of archives themselves: what data is on what media, when it was placed there, when it is due to be moved elsewhere and so on.

Given that OuterBay has now added this facility it would be fair to say that the company’s products have come of age. It’s only a shame that they do not encompass a broader environment. Sooner or later, someone will: whether that is OuterBay or someone else remains to be seen.