ETL vendors don’t understand content

Written By:
Published:
Content Copyright © 2008 Bloor. All Rights Reserved.

Leading data integration vendors like Informatica and Business Objects (SAP) have recently (over the last year or so) been making much of the fact that they can support unstructured data. That is, that you can use these tools to move (say) Word documents around. This is absolutely true and it is useful enough in its way.

But there’s a big difference between a Word document and content. Why do you think that all the ECM (enterprise content management) vendors all have their own tools for migrating data from competitor’s platforms? And why does everybody use these (or an independent product such as EntropySoft‘s, about which I wrote here recently) rather than an ETL (extract, transform and load) tool?

The answer is that ETL vendors extract Word documents at the database level whereas EntropySoft and companies in the ECM space interface at the application level, thereby having an understanding of the context within which that content is stored, and the metadata surrounding it.

Now, all the major players in the data integration space, and even some of the smaller ones, make much of the fact that they have connectors that have a deep understanding of a variety of application environments such as SAP, Oracle and so on, because it is a clear advantage to understand the context within which your data runs or will run when it comes to moving that data. So why don’t these suppliers apply the same principle to content? It’s not even as if they would have to develop their own connectors in the various ECM systems: they could license them from third parties if they preferred.

Actually, I think the ETL vendors are simply unaware that there is an issue with moving content. They think that content is just another word for unstructured data and that they know all about moving data. It hasn’t occurred to them that there are major auditing and validation issues that arise whenever regulatory control is applicable, such as whenever the FDA (Food and Drug Administration) or the SEC (Securities and Exchange Commission) is involved, for example.

The problem here is that the big data integration players are pitching their products as platforms that will serve all your data requirements. The truth is that right now they are very far from offering such capability. Content is one major element that is missing. With a few exceptions, so are replication and synchronisation. Data integration has a long way to go yet.