A new approach to data integration

Basically, all ETL (extract, transform and load) and data integration products work in the same way. Some may be code generating and some may be black boxes, some support or recommend ELT rather than ETL but they all have essentially similar architectures. In particular, they have all been designed for developers even though some of the major vendors like IBM, Business Objects and Informatica have been adding more business-oriented capabilities. However—and this is the point—they have been adding on such facilities.

However, expressor software has just announced its new product, which takes a fundamentally different approach to ETL. In a nutshell, whereas IBM (for example) offers a Business Glossary as an add-on to DataStage, expressor has its equivalent at the heart of its offering, around which the whole product has been built. However, there is more to it than that.

Specifically, expressor is semantically-aware. That is to say, it knows that customer# is the same thing as customer_no, which is the same as customer ID and so on. What that means in practice is two things: first, it means that the environment is much easier for business analysts to use in collaboration with developers, because it is using common terminology and, secondly, it means that mappings between sources and targets that have these equivalents can be automated and do not need to be specified.

Now, you might assume that you have to define these semantic equivalents and, indeed, in specialised circumstances you may need to. But the product, which will ship in June, will come with, to quote expressor, “hundreds of thousands” of built-in name correlations, pertaining to a range of different vertical markets. So, if you have to define your own equivalences they should be few and far between.

Of course, this doesn’t mean that all transformations are automated but it does mean that a lot of the initial legwork is done for you, thereby significantly reducing the effort and time involved in defining data integration processes. Moreover, reuse is an automatic consequence of the semantic approach adopted by expressor and this also applies to business and transformation rules since these have the semantic basis.

In other respects, expressor is not too different from other data integration offerings, though it is surprisingly complete (see www.expressor-software.com) for a first general release, with version control, team development capabilities, project management, the usual flow design graphical interface, performance metrics, role-based security and so on. The expressor parallel processing engine runs under Windows, Linux, UNIX and on IBM mainframes with relevant tools being Windows or web-based. Project-based licenses are available and prices start at $20,000.

It is too early to predict how disruptive this will be but that is clearly the intention: to offer an innovative and appealing alternative to traditional approaches—I will watch and await developments with interest.