Has SNIA’s XAM missed the ILM target?

Written By: Peter Williams
Published:
Content Copyright © 2008 Bloor. All Rights Reserved.

Some of us interested
in full information lifecycle management (ILM) have long pointed to a need for
an industry standard format for metadata that describes the data content of
files to a more granular degree. Currently, software using metadata in this way
has first to create it to its own proprietary format—which is typically
unusable by any other vendors’ software wanting to access the same data.

This being
primarily a storage problem, the obvious body to drive development of a standard
is the Storage Network Industry Association (SNIA). SNIA’s response has been
the development of eXtensible Access Method (XAM) specification, version 1.0 of
which was released last week. The SNIA has yet to gain member approval but,
assuming this is achieved by mid-year as intended, it will then submit the
specification to ANSI and ISO for accreditation. Mid-2008 should also see
release of the XAM SDK available under licence for industry developers.

Hold on a moment.
I understand XAM is only addressing fixed format content at this time. This,
though, is not the biggest problem. It may be a little inefficient but a user
can, if needs be, hard-code access to a particular file type if the format of
the individual fields is known; the code can then do field checks and be as
granular as needed to decide to which storage tier to assign the data or move
it—even without creating metadata.

Don’t get me
wrong, I can understand that creating a standard metadata mapping to the
important fields of all fixed format files using standard syntax means standard
routines can be used instead of reinventing the wheel for each file. If the
process is not made too complicated and long-winded to set up and inefficient
and slow in use there is good reason to see XAM adopted over time.

Yet the bigger
problem is handling free-format content. This is necessary not least because
the increasing regulatory burden includes maintaining documents (including
e-mails and, soon, voice-mails) which contain free-format text. Software
generally ducks the problem of looking at the content of these files as
received, creating metadata for them, and assigning them to appropriate storage
tiers and—most importantly—properly managing it so that the vast majority
can be moved to low-cost off-line storage in a matter of weeks. (A few vendors,
notably Njini, have tackled this.)

Instead,
organisations keep the data for years “just in case,” much of it clogging up their
on-line systems. If a specific compliance request comes in, a search engine may
be used to try and pull out the most likely candidates by matching against
appropriate key words.

Now switch that
around. If appropriate key words are used on the free-format data when received
as part of creating fixed format metadata to accompany the data and you have
largely solved the ILM data tiering problem. (This is essentially the approach
used by Njini.) Once the metadata is created the software works from the
metadata and applies policies or rules to it (and they may update it if a data
change occurs). Apart from a speed challenge when the data is first received—it
may arrive too fast for real-time metadata creation—this procedure can work.
So I wonder why SNIA has not started getting into this.

Fifty companies
are already participating in the SNIA initiative and its two associated
technical workgroups. These include both application developers from storage
vendors and some academic bodies. Among these are some of the “big boys” who
are clearly anxious to push the specification. EMC has contributed a C++ with Java Native
Interface (JNI) wrapper XAM Library while HP has donated a Java version of the
XAM Library. Sun has added code from its Sun StorageTek 5800 (previously
“Project HoneyComb”) for the Hypertext Transfer Protocol (HTTP) and reference
vendor implementation modules (VIMs). This tells me several things:

  1. XAM has lift-off and the potential to become the de
    facto metadata standard for fixed format data. SNIA has the capability and the
    intention to cultivate a SNIA community for pushing the XAM standard, with an
    approvals procedure for XAM-compatibility and conformance within software
    products. It can back this by industry education programmes. That’s the good
    news.
  2. There is a danger that, because it is being
    developed by committee with lots of vested interests, the resulting solution may
    contain lots of bells and whistles that most do not need and which make it
    inordinately complicated, slow and unwieldy to use. The best ways of doing
    things might sometimes be circumvented because one or more of the biggest
    vendors realise that that approach will undermine their competitive position.
    Storage vendors are first and foremost in the business
    of making money so the biggest are especially unlikely to support an elegant approach if it cuts them
    out. Yet such baggage has in the past resulted in standards being ratified,
    only to be neglected and overtaken by other better approaches.
  3. Because of other objectives associated with data
    management, the primary ILM focus may be lost. There is evidence of this in
    SNIA’s XAM announcement which, by the way, never mentions compliance. SNIA also
    announced that its Data Management Forum (DMF) is now starting to develop an
    application-centric standard called a Self-Describing Self-Contained Data
    Format (SD-SCDF); this, SNIA says, will be coupled with the XAM specification
    over time. SNIA says: “The SD-SCDF is aimed at providing application developers
    who adopt XAM, the ability to write a standard, interoperable, long-term
    preservation format and XAM provides SD-SCDF a strategic catalyst enabling
    adoption.”

Without, admittedly, having investigated the detail,
this very description tells me it will introduce a diversion and complexity to
what is conceptually a simple enough task. So could XAM end up as a camel (a
horse designed by a committee) or perhaps a submerged hippopotamus (a
waterhorse designed by several committees)?! That is probably unfair to all the
people working hard to produce a good spec covering all eventualities. However,
if compliance matters are not central to XAM thinking I am not sure how this
horse will be able to stay afloat in practice. I would be more confident if
free format content was also being urgently and sensibly addressed within a
very short time-frame.

XAM looks
interesting and needs to be investigated closely. So I am raising these as my
concerns about what will happen to XAM because there is a need and a great
opportunity it can address—but I fear this will be missed. My concerns may,
of course, be completely unfounded, and I would be delighted to hear from
anyone who can put my mind at rest. With the right motivation and full
attention to handling free format, XAM could then be of real value in achieving
something like full ILM.