Developing an ALM data model for OSLC - Exploiting the data associated with development process for analytics

OSLC (Open Services for Lifecycle Collaboration), a set of open specifications for integrating different tools, is a very welcome initiative and the news that IBM has just transferred its management to OASIS (an independent open-standards organisation) is very welcome. Perhaps more non-IBM tools vendors will now support OSLC and there will be less need for Enterprise Development ALM (Application Lifecycle Management) “rip-and-replace”.

However, is there something missing from OSLC? Shouldn’t there be a data model for ALM behind OSLC? So that when you mine the data that ALM processes should produce (to track defects reducing over time, perhaps; or to asses the effectiveness of various tools at finding defects; or anything else associated with “fact-based” management of the ALM processes), you are using constant terminology and aggregating consistent entities.

Unfortunately, a formal ALM data model appears to be the missing part of OSLC, according to Dave West (once a Forrester analyst and now Chief Product Officer at Tasktop), and he’s trying to do something about it, assisted by Tim Mulligan (ALM Architect, Fidelity Investments – and that’s a novel job title). Both presented at the lifecycle management track at Innovate 20113 and seem to have a little “special interest group” going (which NOT an OSLC development group) – contact Dave West at dave.west@tasktop.com to join. West is actively pushing for the ALM data model to become an official part of OSLC at OASIS.

According to Martin Nally (once Rational CTO and now working on delivering a new product for IBM), OSLC is founded on a data model. In my opinion, it’s just that it’s not one that is terribly useful for analytics across ALM processes/tools.

The OSLC data model is RDF (this lets it create Linked Data), although OSLC only uses a subset of the RDF model; this is all about the internal structure of OSLC and its operation. This is only one of the possible OSLC data models; it doesn’t really help you make sense of analytics against ALM processes across different ALM tools linked via OSLC. If I type “OSLC data model” into Google, by the way, all I get is some documents from the EU Cesar project (“CESAR” stands for “cost-efficient methods and processes for safety relevant embedded systems”; it’s a European funded project which meets similar issues to those addressed by OSLC)

This, it seems to me, all represents an example of a common standards issue – standards often allow tools to communicate or interoperate, even though the semantic results of this communication may not make sense – further “semantic standards” are needed. The example I usually use is “customer data” – there are lots of standards that allow different databases collecting to customer data to interoperate and even combine datasets. However, if process A defines “customer” as “a defined legal entity with which we have an audited commercial relationship including credit ratings and so on” and process B defines “customer” as “anyone who logs into our website and gets a password by typing unverified information into our ‘customer account’ form”, any analytics program that operates across the consolidated “customer database” from Process A and Process B will deliver very misleading results.

OSLC, it seems to me, has a similar issue. It allows tools to interoperate, in a way that is useful to people who know what they are doing and understand the semantics of the different interoperating tools. However, if you want to run smart analytics against the whole ALM process, running on tools linked by OSLC, then you need something more (probably the ALM data model Dave West talks about) if the smart analytics are going to make sense. The view of IBM’s OSLC team seems to be that the OSLC “data model continues to be enhanced, and all parts of the OSLC community are encouraged to help in this; Dave West’s activities can be seen as part of this”. Which is fine, as that is how standards are supposed to evolve, but is does suggest that there is room for another OSLC data model.

Success for a new ALM model that is useful to users of OSLC-connected tools, will come, I think, if or when it delivers tangible benefits for organisations exploiting ALM data for analytics-based insights into the whole of the ALM process they use. across projects. According to Dave West, “without an ALM Data Model, each project’s implementation of its own model makes the value of a cross project data view almost pointless. I believe that the data model will come out of the need for reporting/data warehouses and that will lead us to a consistent lexicon for ALM”.

And, I think, that’s really just enabling another Big Data project and, as usual, Big Data is more about finding people with the creative insight to exploit radically different data sources than it is about having more data (and the metadata needed to make sense of it) provided by some new tool. Nevertheless, I think that exploiting the data associated with the systems development lifecycle across ALM tools will be found worthwhile – and that then the availability of a formal ALM data model may make all the difference to practical applications.