Analyst Coverage: Philip Howard
Data integration is set of capabilities that allow data that is in one place to be used in another place regardless of how they are formatted. This may be done by physically moving the data or not. Physical movement technologies include ETL (extract, transform and load) and ELT (load the data before transforming it) and variations thereof; data replication and B2B exchange (which is essentially a use case). These may be supported by change data capture and other associated techniques. Data virtualisation is the technology used for accessing data without moving it.
Here we specifically define data integration as being technologies that encompass more than just ELT or a variety thereof but which will include replication, B2B exchange and/or data virtualisation as additional capabilities. It is also frequently the case that data integration providers also offer data quality, data governance and/or master data management (MDM) capabilities (amongst others) as integral parts of their data integration platform.
In order to combine and use disparate data that is in different formats you need to transform the data so that it is in a consistent format. When data virtualisation is being used this means creating a virtual view of the data and then using the tool’s abstraction layer to provide that consistent format. However, when physically moving the data the data itself has to be transformed as a separate process. Even in the case of replication this is often the case, depending on the purpose for which replication is being used (for back-up purposes or to support real-time business intelligence).
One notable use case for data integration technologies is data migration, although this will often require other technologies as well, such as data archival, data discovery, data quality and data masking.
Data integration, in all of its forms, is an enabling technology rather than a solution in its own right: it is used to create data warehouses and to exchange information with business partners and between applications. Thus it is most likely to be of interest to CIOs and IT architects.
However, there is an increasing trend towards the deployment of SaaS (software as a service) applications and this is often done at the behest of line of business managers. Surveys have suggested that 51% of companies deploying new SaaS applications have had issues with data integration as part of the implementation process and 19% of projects were cancelled for the same reason. So, increasingly, business executives need to care about data integration.
Historically, replication, ETL and data virtualisation have been regarded as separate technologies and, indeed, there remain many vendors that specialise in just one of these areas. Even platform vendors in this space often effectively have separate solutions across these areas. However, the move towards logical data warehouse mandates that these technologies become more closely aligned as replication, virtualisation and ETL will all be required to support such environments.
In addition, the trend topwards SaaS deployments creates a major headache, not just for loading data into the new application environment but also in supporting cross-application integration and information sharing. There needs to be a major shift towards the automation of connectivity in these sports of environments. Vendors such as Dell (Boomi) and Pervasive are certainly talking about this but to what extent it is actually happening remains to be seen.
Companies such as IBM, Informatica and Talend continue to make acquisitions to broaden their data integration portfolios. In the case of Talend this means extending beyond data integration and into application integration and business process management.
However, broader product portfolios do not necessarily mean that the products themselves are integrated and this space remains bedevilled (there are exceptions) with loosely connected, diverse products that are marketed as a ‘platform’.
In general, we believe that data integration tools are not easy enough to use, are not automated enough and do not perform as well as they should. The market is ripe for some innovative developments.