3 strange things about data archival

I used to regard data archival as pretty boring. Yes, I took briefings from companies like Bit by Bit (bought by OuterBay), OuterBay (acquired by HP) and Princeton Softech (now part of IBM) but this was more a matter of duty rather than any great interest. I had a brief flicker of interest when SAND introduced its Archive Server but this was more because of the technology than archival per se. Basically, all I figured that these companies were doing was building a better mousetrap: data archival has such blindingly obvious cost and performance benefits that it seemed clear that any company with even half a brain spread across its IT department would have implemented it years ago, so I pretty much ignored the whole space. It thus came as a shock to me last year to discover that actually it is only a (small) minority of companies that have implemented data archival. Is it that that is strange or the fact that there are so many IT departments that lack half a brain?

Another thing that is strange, and which perhaps explains the preceding discussion, is that most data archival products do not work on the basis of archiving the data you aren’t using, which seems the most obvious reason for archival (though there are others). Certainly that is true of both IBM, which is the market leader, and Informatica (which acquired Applimation in February), which is the market for leader for archiving data from packaged applications such as SAP. The best approximation to data you aren’t using is arguably time-based archival but this is an extremely blunt instrument. What you really want to be able to do is to archive data that is not being used: so called dormant data. Now it is not unlikely that IBM, Informatica and other vendors are planning to introduce dormant discovery. Indeed, it is a little while since I had a briefing from IBM on this (I have one arranged in early June) so they may have already done so. However, that’s not really the point. What really is the point is that I wrote a product review of Amdahl (Fujitsu) SUNRISE in 2000. In that paper I described how this product was able to determine non-changing (in other words, dormant) data by monitoring how applications were actually used. I guess I naturally assumed that other vendors would have followed suit. It’s pretty strange that they’ve taken a decade to do so. And perhaps it explains why data archival hasn’t been as successful as it should have been: maybe it’s the vendors who lack half a brain.

The third strange thing about data archival is not so much about archiving itself but the fact that one of its core underlying technologies is not fully appreciated. I refer to the fact that to archive data effectively you need to ensure that the data is referentially intact and this is much easier to achieve if you work at the level of business entities rather than at the table level. Both IBM Optim and Informatica (Applimation) do this. However, while Informatica is clear that this also has significant advantages in other areas where you want to move data around, for example data migration, IBM has been slow to move in this direction. Certainly, the recent acquisition of Exeros indicates that the company understands the importance of understanding relationships but this isn’t the same thing as explicitly supporting the migration of business entities as opposed to database tables. In this case I don’t think its anything to do with lack of brainpower but the fact that they are too busy doing other (and arguably too many) things. Nevertheless, I think they are missing a trick by not recognising that data migration and similar activities have special requirements that they could exploit using technology imported from Optim, and that’s strange.