Matillion was founded in Manchester, England in 2011 and Matillion ETL was launched in 2015. Since the launch the company has established a second headquarters in the United States. In recent years it has frequently been described as one of the fastest growing privately owned (it is backed by venture capital) tech companies in the UK.
The company has more than 1,100 customers worldwide, including such household names as Amazon, Sony, Nintendo, Subway, and Cisco, as well as various mid-market enterprises. It also has a significant partner network.
Matillion ETL (2021)
Last Updated: 4th February 2021
Mutable Award: Gold 2020
Matillion ETL is a cloud-native data integration tool. However, it is misnamed: it should really be Matillion ELT because its processing paradigm is to extract data from source systems, load that into the relevant target and then transform the data as appropriate for the use case. This is illustrated in Figure 1.
A second tool, Matillion Data Loader is also provided to facilitate migrations from existing environments into cloud data warehouses and data lakes. This is illustrated in Figure 2. This is a SaaS-based application that is provided free of charge. It offers a wizard-driven environment not requiring any coding. Note that, unlike Matillion ETL, Azure Synapse Analytics is not currently supported. Also note the more limited data sources supported by Data Loader.
Finally, Matillion offers a significant number (90+) of source connectors. However, most of these are application-oriented and native connectors (much to be preferred for performance reasons) to databases are relatively limited though JDBC is supported. There are facilities to build your own connectors (using a REST API), but this will only be relevant to applications and, in any case, this only applies to source systems and not to targets. The company plans to introduce a community portal where users can exchange connectors as well as other integration artefacts.
“Matillion enables our team to provide meaningful data insights quickly. And, because it’s built for modern cloud data warehouses, we can use native Snowflake functionality to transform our data.”
Matillion’s primary strength and major differentiator lies in the tightness of its integration with its target environments. Some of the significant capabilities it supports are highlighted in Figure 3. While excellent for supported environments, which will include Databricks in the next release, implementing such depth of capability also explains why Matillion does not support more targets as, obviously, there is a significant investment in each target: it is not simply a question of building a connector. However, it also limits the use of Matillion. If, for example, you are an existing Teradata customer that wants to migrate to Teradata Vantage in the cloud then Matillion wouldn’t be the supplier to choose. That said, the company is starting to introduce what it calls “tactical output connectors”. The first of these is for Salesforce, with Intercom, Oracle and SQL Server to follow. Support for cloud object storage, such as Amazon S3, is provided.
In practice, the way that Matillion works is to generate SQL. It makes extensive use of dynamic variables in order to minimise any required code regeneration (needed, for example, whenever there are schema changes), which is the biggest downside of any code generating tool. Other notable points are that, while Matillion itself does not provide change data capture it does support technologies such as Snowflake Streams, which provide equivalent functionality. There is a built-in scheduler or you can define listeners that will trigger jobs when required.
Compared to some traditional data integration vendors Matillion has the advantage that it is cloud-native, with all that that implies. More specifically, its integration with supported targets is of greater depth than you would normally get from competitive suppliers. The support for application-based sources is extensive. And because of the free data loader, Matillion ETL is suited to both new implementations of these data warehouses as well as for use where companies are migrating to these environments. That said, all the traditional data warehousing vendors are introducing, or have already introduced, cloud-based versions of their various products. While much of the buzz about cloud data warehouses over the last few years has been about the targets that are currently supported by Matillion, we don’t expect this to last. We would therefore like to see an acceleration in Matillion’s support for other warehouses and data lakes.
The Bottom Line
Matillion is a pure-play ELT vendor with excellent support for the environments it targets. If that fits your requirements it is worth serious consideration. Indeed, it should definitely be in your shortlist.
Mutable Award: Gold 2020
Matillion ETL (2022)
Last Updated: 3rd October 2022
Matillion ETL is a cloud-native data integration tool that sits inside your virtual private cloud, although despite its name it operates using an ELT paradigm, which is to say that it transforms the data after, rather than before, it has been loaded into its target environment. A second tool, Matillion Data Loader, exists to facilitate migrations from existing environments into cloud data warehouses and data lakes via the creation of data pipelines. Both platforms natively support all the major cloud platforms, including cloud data warehouses such as Snowflake and Databricks. That said, Matillion Data Loader supports slightly fewer data sources than the main product. The company also offers Matillion Hub, a ‘central nervous system’ for managing your Matillion services, and Matillion Exchange, a marketplace for community-created assets.
“Matillion Data Loader allowed us to solve our problem without any changes required to the source and we were live within weeks.”
“We chose Matillion due to its cloud-native architecture, ease of deployment, adaptability, and smaller ramp-up curve. We have achieved 5X improvement in our data processing speed.”
Matillion ETL primarily offers a graphical, low-code development user interface for architecting your ELT processes. This of course includes transformations, as well as cloud infrastructure orchestration. It is deployed – as mentioned, to your virtual private cloud – as one (or more) of a series of virtual machine images, each one tailored for a specific cloud platform. APIs are available for capturing data lineage and other metadata as part of your ELT processes. The product also provides infrastructure management capabilities within the same (low-code) environment.
Under the hood, Matillion ETL works by generating SQL code. It makes extensive use of dynamic variables in order to minimise any required code regeneration (needed, for example, whenever there are schema changes), which is the biggest downside of any code generating tool. Other notable points are that, while Matillion ETL itself does not provide CDC, it does support technologies – such as Snowflake Streams – which provide equivalent functionality. In addition, Matillion Data Loader offers CDC (see below). There is also a built-in job scheduler, and you can define listeners that will trigger jobs when required.
Matillion ETL offers a significant number (90+) of source connectors. These are primarily application-oriented. There are also native connectors (much to be preferred for performance reasons) to most popular relational databases, although the overall number of database connectors is relatively limited. Facilities exist to build your own connectors (using a REST API) to source applications and systems, but such functionality does not currently extend to targets.
That said, one of Matillion ETL’s primary strengths lies in the tightness of its integration with its target environments. While excellent for supported environments, implementing such depth of capability also explains why Matillion ETL does not support building your own, as there is a significant investment in each target: it is not simply a question of building a connector. Matillion ETL supports cloud platforms, including the ‘big three’ of AWS, Azure and Google Cloud, as well as Snowflake, Redshift, and Databricks. Multi-cloud is supported on all of Matillion’s platforms. Support for cloud object storage, such as Amazon S3, is also provided.
Matillion Data Loader is a no-code offering for building data pipelines. It leverages an agent-based, hybrid-SaaS architecture – it operates like a SaaS, but your data always remains within your environment – and a freemium, consumption-based pricing model. For most potential customers, it will be most appealing as a fast and easy way to get your data into the cloud. It is even simpler to use than Matillion ETL, and like its sister product it offers a broad range of connectivity options, including compatibility with batch replication and CDC.
On the other hand, it doesn’t have a transformative capability: it is strictly a migratory tool. On the third hand, Matillion ETL itself (and, frankly, most data integration offerings) will be overkill for the sorts of tasks that don’t require transformations. Notably, and unlike Matillion ETL, Matillion Data Loader also offers CDC. This is log-based, and can integrate with your other data integration processes (for example, to trigger downstream transformations).
Matillion ETL has been designed to take full advantage of the cloud via features such as native cloud integrations and push-down ELT to major cloud platforms. It is enterprise-ready, enabled in part by its scalability and catalogue support, and it offers a user-friendly interface that incorporates low-code, drag-and-drop techniques for building data pipelines and integration processes.
Moreover, it provides bespoke deployment images for each cloud platform as well as a range of prebuilt connectors, particularly for application-based sources. In addition, the depth of integration offered often goes further than the competition. On the other hand, database and data warehouse support is relatively limited (perhaps due to the latter point).
Matillion Data Loader in particular is a highly fit-for-purpose tool for enabling cloud migrations, one of the more significant data integration use cases at the moment. It is even simpler to use than Matillion ETL, is available as a freemium offering, and is both lightweight and secure (thanks to its hybrid-SaaS deployment model). In other words, it is very easy to start getting value out of it. In turn, because of Matillion Data Loader, Matillion as a whole can address both new implementations of supported data warehouses and cases where companies are migrating to these environments.
The Bottom Line
Matillion ETL offers relatively pure-play ELT, and together with Matillion Data Loader it makes for an effective and easy to use solution for data integration, especially in the cloud.