Open Source ETL

Written By:
Published:
Content Copyright © 2007 Bloor. All Rights Reserved.

Some 18 months ago I wrote an article called “the case for open source ETL. At that time I had identified four products (one of which, Kettle, has since been acquired by Pentaho) in the open source data integration market. Amongst the various comments posted by readers were a number of recommendations to look at alternative products and I have finally (better late than never!) got around to looking at Talend.

There are several interesting things about Talend. To begin with, its product was in development for three years (a long time for an open source product) before it even reached beta testing, which was in August 2005. Moreover, its first official release was not until October last year, which explains why it wasn’t in my previous article. More recently, it released Talend Open Studio v2.0 in April this year and Talend on Demand (a software as a service offering) this month.

Secondly, Talend is French and only opened a US office earlier this year. This in itself isn’t particularly interesting but it reflects the fact that far more open source products are coming out of Europe nowadays: for example, three of the four open source ETL products referenced in my previous article were European, as is Talend; while JasperSoft, the open source BI vendor, is based on work that was originally (and continues to be) done in Italy and Romania. I don’t have space here to discuss the significance of this but it is certainly food for thought.

The third thing that is interesting about Talend is that it is not a black-box solution but is instead a code generating product that produces Java and/or Perl, plus (embedded) SQL as needed. Now, when ETL first started the early products (of which the most notable survivor is ETI) were all code generating offerings. However, back in the early and mid-nineties, which is when we are talking about, there was no such thing as portable code, so you had to re-generate your software to run it across heterogeneous systems, you had multiple versions of the same software and the whole thing was a mess. As a result, we saw the rise of black-box solutions like Informatica PowerCenter and Ardent DataStage (now part of the IBM Information Server). Portability was more important than the fact that a black box can be a bottleneck and that the compiled code derived from a code generating product will typically perform better than the interpreted code in a black box.

Talend has made a bold decision to go with code generation because it hasn’t been popular for a long time but, as I have just indicated, it is time that this view was re-evaluated. Nor should it impact on the heavy lifting end of the market because not only does Talend support ELT (see below) as well as ETL but it is also able to leverage a server grid in order to parallelise processing. So, Open Studio should be suitable for both operational data integration and for use with decision support systems.

In technical terms, Talend supports both ETL and ELT in the latest release (actually, it was designed with that in mind from the outset but not all the features were complete in the first version); it is metadata-driven based on a central repository of project information; it includes a real-time debugger; it has the usual graphical user interface with a palette onto which you drag and drop icons; it allows distributed processing through support for grid architectures; it supports web services; and, most interesting of all, it supports business-oriented process modelling.

This last point is important. Informatica and its ilk do not yet provide support for modelling at the business level as well as the more conventional technical level modelling, which means that, for the time being and in this area in particular, Talend actually has a lead over the big boys.

Finally, on the commercial side, Talend has a growing number of both end users and OEM partners. In the latter case, for example, JasperSoft OEMs Open Studio and markets it as part of its BI suite. Other such partnerships will be announced in due course. Perhaps more significantly, if you go to the web site of Talend and then go to the web sites of CloverETL, KETL, Enhydra Octopus (all pure play open source ETL vendors like Talend) and so on, you will see that Talend’s looks far more professional. Image, I know, but it is indicative of where Talend is and where it thinks it is going. Talend may have entered the open source ETL market late but it looks to me like it will soon be the clear market leader in this space if it isn’t already.