Accessing archived data

One of the things that concerns companies about archiving, and perhaps puts them off, is how they are going to access the data after it is archived. If the originating application is still in operation then you may be able to use SQL to define and retrieve an extract from the archive and recreate it within the operational system. Of course, this approach is not available if the application has been retired. In any case, this is cumbersome; though if you simply want to run analytics then it’s probably fine. Otherwise, the typical approach is to use some sort of query-based approach: either directly via a business intelligence tool or via a join or federated view.

The problem with all of these query-based approaches is that they are exactly that: queries. You don’t get the full functionality of an application that lets you move freely through the data in a train of thought fashion. Moreover, if you have to go to IT to get new queries defined (which you probably will) then how long will it be before you get it working?

I have found a better way. DataNovata from NSC Programming is, for want of a better term, an archived data application generator. That is, it reads the archived database schema (it doesn’t have to a relational database, it could be RainStor, for example, which preserves schema information even though it is a flat file system) as well as the archived data itself and then auto-generates a relevant web-based application on that basis. There are a variety of options for customisation. If some or all of the data is unstructured then there is a module that can create structure from that—build indexes and define primary/foreign keys.

Such applications are, of course, read-only, though there is the ability to add notes to the archived data in order to support collaboration if there is any sort of problem or issue (which is presumably why you are looking at archived data in the first place). Purge options are built into the application.

There are several nice things about this approach. The first is that the generated application is user-focused. The second is that the generation itself is automated, apart from any customisations you might want. The third is that, unlike using a BI tool where you have to define a whole set of queries for different purposes, here you effectively get all of your queries from a single process. Of course, you may already have a BI tool that you can use for this purpose but the relatively low cost of DataNovata should mean that you more than recoup that because of the ease of the whole process and because there is so little reliance on IT staff. Not to mention time to value.

I am firmly of the opinion that all large organisations should formally archive their data on an ongoing basis but there are many companies that do not do this. If access to the data is one of the reasons for holding back on implementing archival then the entry into the market of DataNovata should be welcomed by all and sundry.