At A3’s recent Technology Live event in Munich, I got to meet a couple of vendors selling novel long-term write-only storage solutions. By “long-term”, they are thinking in terms of centuries, about preserving the intellectual artifacts of our civilization for future civilizations (assuming that there are any, after our “experiments” with climate change and nuclear weapons). The problem they are addressing is that we know a fair bit about the Babylonians, for example, because we can read their cuneiform records, which they incised into clay tablets. Clay tablets last well – and last even better if the house they are stored in burns down. We are increasingly keeping electronic records and these usually have a very limited lifetime. Even if a 5.25-inch floppy disk physically lasts a century or so (doubtful), will there be any devices to read it then, will they still work, and if they do, will anyone still understand the data formats used? And even if they do, how much will the language semantics implied have changed, if the language used is still “living”. Reading cuneiform on a baked clay tablet may be easy in comparison (the Babylonian language is hardly changing much anymore).
Preserving our culture is only one use case for these new long-term storage technologies (which I’ll describe in more detail later), but backups aren’t really one of them, in my opinion. Let us look at backups vs archives, and the technology solutions both need.
A backup is really only useful in the context of restoring data after a failure or mishap, although it can sometimes also be used as a snapshot of an organization’s data at a point in time, for processing independently of live production systems (this processing thus having little or no impact on production performance). A backup must be accurate, secure (stealing yesterdays, but still current, data from a backup is sometimes trivially easy, compared to hacking into a production system) and reasonably fast to access. Backup and restore must be designed, but there are relatively few use cases to consider (is random access needed, or is sequential access adequate, for example) but the main issue is that restore must be tested regularly. A backup is intended for short term use. In extremis, you might be able to use a backup from 10 years ago to, say, recreate a transaction that is the subject of a legal dispute, if it is still readable, but you might not. For a start, does the backup support referential integrity of the data – does it hold all the data involved with a transaction and all the relationships needed to support processing logic? Then, can you afford to process it – if the backup is arranged as a serial file, and your processing logic requires random access, recreating a process over time might be horribly expensive. And are you sure that the backup hasn’t been changed in any way over the years? It all depends on the use cases you have to satisfy – and sometimes what you really need is an archive.
The key characteristic of an archive, according to my friendly AI’s trawl of the Internet (I did check with a dictionary) is that: “it holds materials for long-term preservation, often due to their enduring cultural, historical, or evidentiary value”. I think that this is an incomplete definition (sorry AI) of a real archive because it is important that the archived materials remain usable. This implies that the archive is designed to accommodate likely future use cases, with policies and procedures around it to ensure that it remains readable as technology and society changes – that it is a managed archive management system, in fact – and that it is regularly tested to ensure that it remains usable.
Now, let us look at some use cases. Museums and cultural institutions are currently digitizing cultural assets – so even if the originals are destroyed, the cultural asset is not wholly lost. In the short term, these assets can be kept on ordinary magnetic storage made write-only with software and backed up in the normal way. But, that rather assumes that the cultural institution managing these assets will survive the zombie apocalypse – or the dysfunctional future represented by the 49th term of Barron Trump as hereditary World President – which is by no means guaranteed. What you really want, if you care about the future of your culture, is analogue images alongside the digital assets, inscribed on something like stone tablets and complete with user manual, buried under a sewage farm or somewhere else where the iconoclasts are unlikely to come looking.
Another use case is the preservation of family history, of the sort often lost forever in a house clearance after someone dies. Stone tablet technology might be useful here too, but (I’d suggest) best supplied “as a service” by an interested organization such as a family history or genealogical society. Such an organization will be able to advise on what is worth keeping and on how to ensure that your great grandchildren still have access. An in-house archive of family pictures on a laptop that your children have sold on eBay when you died, won’t cut it.
There are many use cases for archival storage but I’ll just consider one more here – protection against ransomware. A conventional backup routine with some write-only storage, can probably cope with this, but for ultimate assurance, you might want to keep the system snapshots you’ll rely on to recreate your hijacked environment using an entirely different technology to the one that the cyber-criminals are exploiting. Again, you’ll need to proactively imagine various ransomware scenarios and design your business archive appropriately. And, test simulated ransomware recoveries before you are actually attacked.
Now, I’ll describe the two new technologies which piqued my interest. First, Cerabyte, which promises to deliver “a new tier in the storage stack”. In 2030, it expects to fit between tape and spinning rust in the stack, with a media lifetime of 100+ years, access to particular data in seconds, data transfer rates around 1-2 GB/sec and costing about $1/terabyte. Inspired by Babylonian clay tablets, it uses “ceramic punch cards at nanoscale”, reusing micro-mirror technology (developed for maskless lithography in semiconductor fabrication) to write millions of bits at one time to a ceramic coated, thin, flexible sheet of glass (basically, silica-sand, very “green”). The data is encoded using (most importantly) an open codec. The sheets are packaged so as to fit in existing tape library automation technology. Read speed is managed through massive parallelization by CMOS sensors. The entire data lifecycle is supported: writing, storage, access, reading, deletion (via a delete flag) and destruction/recycling (the glass sheets can be crushed and recycled). One use case considered is the safeguarding of data & information across generations: keeping personal memories for a lifetime with a small eco-footprint; and passing on the digital heritage to future generations. It also targets the usual government/regulatory/scientific archive requirements. Perhaps the vision, however, is that it makes permanent storage actionable – for analytics and AI looking at very long-term trends. It has a very plausible future roadmap, with EU funding, and it makes a great point of its re-purposing of existing technologies and industries (from display-glass, digital-projector, and smart phone applications) which gives one confidence in the roadmap.
Second is Piql – you “pickle” your data for long-term storage and Piql promises to deliver “your data immortalized”. Of course, that is not all that it offers and perhaps one should be careful about what one asks for – do you want dissatisfied customers reopening disputes from decades ago, knowing that you still have records from that time? The Piql story is very similar to Cerabyte’s, except that the storage medium is different – 35mm PET (Polyethylene terephthalate) cinema film. This is pretty resilient and probably more resistant than glass plates to bomb attacks, but vulnerable to some industrial solvents (such as dimethyl sulfoxide, DMSO). Piql has been around since 2002 (converting digital film to analogue film stock), compared to 2022 for Cerabyte, so it probably got to the messaging first (although I’m not sure that it can still claim to be unique). It is strongly supported by the Norwegian government and appears to have a significant installed base and an impressive customer list. Its archives include descriptions of the data encoding used, possibly an analogue (human readable) image of any digitized pictures, and instructions for building an archive reader from scratch. It has also invested in practical longevity, with established “piqlVaults” (including one under a mountain in Norway) for safe long-term storage of your data on piqlFilm. At the Iron Mountain piqlVault in Norway, for instance, “…security is the top priority. The vault facility owned by Bulk Infrastructure where the piqlFilms are stored follows best practices and is certified according to ISO 9001 (Quality Management) and ISO 27001 (Information Security Management). Furthermore, emergency preparedness and contingency plans are in place that align with international ISO 22301 standards, ensuring robust infrastructure resilience”. The availability of this sort of storage resilience, and the policy and procedures around it, is probably just as important as the resilience of piqlFilm itself. My final takeaway from all this? Well, that archiving is a matter of risk management. Your risk management, not someone else’s. You need to think about what your archive is supposed to achieve and what threats it will have to deal with – and over what time period. Then if, and only if, you need them, there are technologies that will give you a fighting chance of maintaining your archives for hundreds of years or even, conceptually, “forever” (although whether your organization, or anybody else, will be around for that long, to make use of the archive, is moot). Nevertheless, the infrastructure and policies/procedures around your archive technology are just as important as the technology itself and most useful archives will also need to be of use, and cost-effective, over a much shorter time period, in order to justify their creation.