Some of us interested in full information lifecycle management (ILM) have long pointed to a need for an industry standard format for metadata that describes the data content of files to a more granular degree. Currently, software using metadata in this way has first to create it to its own proprietary format—which is typically unusable by any other vendors' software wanting to access the same data.
This being primarily a storage problem, the obvious body to drive development of a standard is the Storage Network Industry Association (SNIA). SNIA's response has been the development of eXtensible Access Method (XAM) specification, version 1.0 of which was released last week. The SNIA has yet to gain member approval but, assuming this is achieved by mid-year as intended, it will then submit the specification to ANSI and ISO for accreditation. Mid-2008 should also see release of the XAM SDK available under licence for industry developers.
Hold on a moment. I understand XAM is only addressing fixed format content at this time. This, though, is not the biggest problem. It may be a little inefficient but a user can, if needs be, hard-code access to a particular file type if the format of the individual fields is known; the code can then do field checks and be as granular as needed to decide to which storage tier to assign the data or move it—even without creating metadata.
Don't get me wrong, I can understand that creating a standard metadata mapping to the important fields of all fixed format files using standard syntax means standard routines can be used instead of reinventing the wheel for each file. If the process is not made too complicated and long-winded to set up and inefficient and slow in use there is good reason to see XAM adopted over time.
Yet the bigger problem is handling free-format content. This is necessary not least because the increasing regulatory burden includes maintaining documents (including e-mails and, soon, voice-mails) which contain free-format text. Software generally ducks the problem of looking at the content of these files as received, creating metadata for them, and assigning them to appropriate storage tiers and—most importantly—properly managing it so that the vast majority can be moved to low-cost off-line storage in a matter of weeks. (A few vendors, notably Njini, have tackled this.)
Instead, organisations keep the data for years "just in case," much of it clogging up their on-line systems. If a specific compliance request comes in, a search engine may be used to try and pull out the most likely candidates by matching against appropriate key words.
Now switch that around. If appropriate key words are used on the free-format data when received as part of creating fixed format metadata to accompany the data and you have largely solved the ILM data tiering problem. (This is essentially the approach used by Njini.) Once the metadata is created the software works from the metadata and applies policies or rules to it (and they may update it if a data change occurs). Apart from a speed challenge when the data is first received—it may arrive too fast for real-time metadata creation—this procedure can work. So I wonder why SNIA has not started getting into this.
Fifty companies are already participating in the SNIA initiative and its two associated technical workgroups. These include both application developers from storage vendors and some academic bodies. Among these are some of the "big boys" who are clearly anxious to push the specification. EMC has contributed a C++ with Java Native Interface (JNI) wrapper XAM Library while HP has donated a Java version of the XAM Library. Sun has added code from its Sun StorageTek 5800 (previously "Project HoneyComb") for the Hypertext Transfer Protocol (HTTP) and reference vendor implementation modules (VIMs). This tells me several things:
- XAM has lift-off and the potential to become the de facto metadata standard for fixed format data. SNIA has the capability and the intention to cultivate a SNIA community for pushing the XAM standard, with an approvals procedure for XAM-compatibility and conformance within software products. It can back this by industry education programmes. That's the good news.
There is a danger that, because it is being
developed by committee with lots of vested interests, the resulting solution may
contain lots of bells and whistles that most do not need and which make it
inordinately complicated, slow and unwieldy to use. The best ways of doing
things might sometimes be circumvented because one or more of the biggest
vendors realise that that approach will undermine their competitive position.
Storage vendors are first and foremost in the business of making money so the biggest are especially unlikely to support an elegant approach if it cuts them out. Yet such baggage has in the past resulted in standards being ratified, only to be neglected and overtaken by other better approaches.
- Because of other objectives associated with data management, the primary ILM focus may be lost. There is evidence of this in SNIA's XAM announcement which, by the way, never mentions compliance. SNIA also announced that its Data Management Forum (DMF) is now starting to develop an application-centric standard called a Self-Describing Self-Contained Data Format (SD-SCDF); this, SNIA says, will be coupled with the XAM specification over time. SNIA says: "The SD-SCDF is aimed at providing application developers who adopt XAM, the ability to write a standard, interoperable, long-term preservation format and XAM provides SD-SCDF a strategic catalyst enabling adoption."
Without, admittedly, having investigated the detail, this very description tells me it will introduce a diversion and complexity to what is conceptually a simple enough task. So could XAM end up as a camel (a horse designed by a committee) or perhaps a submerged hippopotamus (a waterhorse designed by several committees)?! That is probably unfair to all the people working hard to produce a good spec covering all eventualities. However, if compliance matters are not central to XAM thinking I am not sure how this horse will be able to stay afloat in practice. I would be more confident if free format content was also being urgently and sensibly addressed within a very short time-frame.
XAM looks interesting and needs to be investigated closely. So I am raising these as my concerns about what will happen to XAM because there is a need and a great opportunity it can address—but I fear this will be missed. My concerns may, of course, be completely unfounded, and I would be delighted to hear from anyone who can put my mind at rest. With the right motivation and full attention to handling free format, XAM could then be of real value in achieving something like full ILM.